One of my favorite moments in HN history was watching the authors of the various search tools decide on a common ".ignore" file as opposed to each having their own: https://news.ycombinator.com/item?id=12568245
I would argue that grep-like tools which read .gitignore violate the Principle of Least Astonishment (POLA). It would be fine if there were a --ignore flag to enable such functionality, but defaulting to it just feels wrong to me. Obviously smarter people than I disagree, but my dumdum head just feels that way.
> Obviously smarter people than I disagree, but my dumdum head just feels that way.
That's absolutely not it. What you're describing is part of the UNIX philosophy: programs should do one thing and do it well, and they should function in a way that makes them very versatile and composable, etc.
And that part of the philosophy works GREAT when everything follows another part of the philosophy: everything should be based on flat text files.
But for a number of reasons, and regardless of whatever we all think of those reasons, we live in a world that has a lot of stuff that is NOT the kind of flat text file grep was made for. Binary formats, minified JS, etc. And so to make the tool more practical on a modern *nix workstation, suddenly more people want defaults that are going to work on their flat text files and transparently ignore things like .git.
It's just that you've showed up to an wildly unprincipled world armed with principles.
Back in the day I would have agreed with you, but ever since there is js everywhere you end up with minified js that megabytes big and match everything. I still have muscle memory with `grep -r` and it almost always ends up with some js file, that I didn't know exists ruining the moment.
> Obviously smarter people than I disagree, but my dumdum head just feels that way.
No you are correct, do not doubt yourself. Baked in behavior catering to a completely separate tool is bad design. Git is the current version control software but its not the first nor last. Imagine if we move to another source control and are burdened with .gitignore files. No thanks.
The Unix tools are designed to be good and explicit at their individual jobs so they can be easily composed together to form more complex tools that cater to the task at hand.
Given that they were quoting themselves, I think it's probably intended as humor. (I had to scroll up to check but I immediately suspected, and it made me chuckle)
I have to agree here. I love ripgrep, but at times I've had to go back to regular grep because I couldn't figure out what it was ignoring and why, and there were far too many settings to figure it out.
It's a tough one. Lately I've been doing `rg -u` every single time because too many things get ignored and I can't be bothered to figure out how to configure it more cleanly to do what I want by default.
It's good if they can share syntax. You use the same English words to ask Alice and Bob questions, but when you say "So, tell me, Aliceā¦" you don't want Bob to answer you instead. Using another tool's config by default, making it difficult/impossible to use the dedicated config is the most annoying thing I can imagine. If that's what rg does, I guess that must be the reason I couldn't switch.
It probably wouldn't have made sense twenty years ago (or 60 years ago, when IBM engineers first wrote about the Principle of Least Astonishment [1] in 1966).
But it does make sense today.
I'd argue that modern computers do many astonishing and complicated and confusing things - for example, they synchronize with cloud storage through complex on-demand mechanisms that present a file as being on the users' computer, but only actually download it when it's opened by the user - and they attempt to do so as a transparent abstraction of a real file on the disk. But if ripgrep tried to traverse your corporate Google Drive or Dropbox or Onedrive, users might be "astonished" when that abstraction breaks down and a minor rg query takes an hour and 800 GB of bandwidth.
It used to be that one polymath engineer could have a decent understanding of the whole pyramid of complexity, from semiconductors through spinlocks on to SQL servers. Now that goal is unachievable for most, and tools ought to be sophisticated enough to help the user with corner cases and gotchas that make their task more difficult than they expected it to be.
Iāve read this multiple times over the years and this post is still the most interesting and informative piece describing the problem of making a fast grep-like tool. I love that it doesnāt just describe how ripgrep works but also how all the other tools work and then compares the various techniques. Itās simultaneously a tutorial and an expert deep dive. Just a beautiful piece of writing. In a perfect world, all code would be similarly documented.
He also wrote the rust CSV parsing library and then the command line utility xsv that uses it. It's an incredible piece of software for breaking down massive CSV files where fields contain linebreaks and you thus cannot use sed to just get ranges of lines.
Such a good read. I actually went back though it the other day to steal the searching for the least common byte idea out to speed up my search tool https://github.com/boyter/cs which when coupled with the simd upper lower search technique from fzf cut the wall clock runtime by a third.
There was this post from cursor https://cursor.com/blog/fast-regex-search today about building an index for agents due to them hitting a limit on ripgrep, but Iām not sure what codebase they are hitting that warrants it. Especially since they would have to be at 100-200 GB to be getting to 15s of runtime. Unless itās all matches that is.
Yeah, that Cursor blog post is a bit iffy since they just brush over the "ripgrep is slow on large monorepos", move on to techniques they used, and then completely ignore the fact that you have to build and maintain the index.
On a mid-size codebase, I fzf- and rg-ed through the code almost instantly, while watching my coworker's computer slow down to a crawl when Pycharm started reindexing the project.
I'm not into the low level minutiae but on large code bases I sometimes see a lag on the first rg:s I run and then it's fast, which I attribute to some OS level caching stuff.
Perhaps they run their software on operating or file systems that can't do it, or on hardware with different constraints than the workstation flavoured laptops I use.
Ripgrep is used as the defautl search backend for ArchiveBox, such a good tool. I was on ag (the-silver-searcher) for years before I switched, but haven't gone back since.
When I first heard about ripgrep my reaction was laughing. grep had been too established. No way something that isn't 100% compatible with grep could get any traction.
And I was dead wrong. Overnight everyone uses rg (me included).
I don't know if this is coincidence or not but Cursor just made a post breaking down why they moved to their own solution in place or Ripgrep and it makes a lot of sense from a cursory (haha) read.
Is IRIX experiencing a hobbyist revival or something? This is the second IRIX reference Iāve seen on here in the past two days, and there was a submission a day or two ago (c.f. a Voodoo video card?) as well. I havenāt personally encountered IRIX in the wild since a company I worked at in 2003. I suppose SGI has always had a cool factor but itās unusual seeing it come up in a cluster of mentions like this.
SGUG tried hard to port newer packages for IRIX for several years but hit a wall with ABI mismatches leading to GOT corruption. This prevented a lot of larger packages from working or even building.
I picked up the effort again after wondering if LLMs would help. I ran into the ABI problems pretty quickly. This time though, I had Claude use Ghidra to RE the IRIX runtime linker daemon, which gave the LLM enough to understand that the memory structures Iād been using in LLVM were all wrong. See https://github.com/unxmaal/mogrix/blob/main/rules/methods/ir... .
After cracking that mystery I was able to quickly get āimpossibleā packages building, like WebKit, QT5, and even small bits of Go and Rust.
Iām optimistic that weāll see more useful applications built for this cool old OS.
That is pretty neat. I guess this sort of unlocking and unblocking effort is exactly whatās needed for a revival.
Iām sort of thinking of AmigaOS/Workbench as well although, perhaps because of what I would assume was always a much larger user base than SGI had, it maybe never went away like SGI and IRIX did.
It is great seeing these old platforms get a new lease of life.
Ooh that's super interesting. I assume you shared the recipe with the irix community? I remember keeping Netscape up to date on my Indy was already a struggle in 2002.
I was using ripgrep once and it had a bug that led me downa terrifying rabbit hole - I can't recall what it was but it involved not being able to find text that absolutely should have been there.
Eventually I was considering rebuilding the machine completely but for some reason after a very long time digging deep into the rabbit hole I tried plain old grep and there was the data exactly where it should have been.
So it's such a vague story but it was a while back - I don't remember the specifics but I sure recall the panic.
I agree, it's a great tool with a catastrophically wrong default that silently and unpredictably catches people off-guard. I've tried using ripgrep many times but have been burnt too many times and can never trust its search to be comprehensive. It absolutely fails to find important stuff, and I can rarely predict whether the files it's going to skip intersect with my files of interest. And at this point I'm too burnt to care to pass flags to stop it from doing that. Which basically means I always run grep unless I know the number of matches beforehand and it's too large a directory to wait a few seconds for, in which case I run grep after rg fails to find it.
If it actually matched grep's contract with opt-in differences that'd be a gamechanger and actually let it become the default for people, but that ship seems to have sailed.
Sometimes I forget that some of the config files I have for CI in a project are under a dot directory, and therefore ignored by rg by default, so I have to repeat the search giving the path to that config files subdirectory if I want to see the results that are under that one (or use some extra flags for rg to not ignore dot directories other than .git)
Was the file in a .gitignore by any chance? I've got my home folder in git to keep track of dot/config files and that always catches me out. Really dislike it defaulting to that ignoring files that are ignored by git.
In a repo, sure. On your own fs it feels like a footgun and every tool copies the behavior a little differently, which means you stop trusting the results and start wondering which files got skipped.
You started using it because it had that capability I imagine, not because it is the default. You could easily just alias a command with the right flag if the capability was opt-in.
> You could easily just alias a command with the right flag if the capability was opt-in.
I tried a search to make grep ignore .gitignore because `--exclude=...` got tedious and there was ripgrep to answer my prayers.
Maintaining an alias would be more work than just `rg 'regex' .venv` (which is tab-completed after `.v`) the few times I'm looking for something in there. I like to keep my aliases clean and not have to use rg-all to turn off the setting I turned on. Like in your case, `alias rg='rg -u'`, now how do you turn it off?
I had that happen too recently⦠Basically rg x would show nothing but grep -r x showed the lines for any x. Tried multiple times with different x, then I kept using grep -r at that time. After a few days, I started using rg again and it worked fine but now I tend to use grep -r occasionally too to make sure.
Next time that happens try looking at the paths, adding a pair of -u, or running with --debug: by default rg will ignore files which are hidden (dotfiles) or excluded by ignore files (.gitignore, .ignore, ā¦).
I use "grep" to search files (it should never skip any unless I tell it to do otherwise) and "git grep" to be a programmer searching a codebase (where it should only look at code files unless I tell it to do otherwise). Two different hats.
I wouldn't want to use tools that straddle the two, unless they had a nice clear way of picking one or the other. ripgrep does have "--no-ignore", though I would prefer -a / --all (one could make their own with alias rga='rg --no-ignore')
I don't remember why I didn't switch from ag, but I remember it was a conscious decision. I think it had something to do with configuration, rg using implicit '.ignore' file (a super-generic name instead of a proper tool-specific config) or even .gitignore, or something else very much unwarranted, that made it annoying to use. Cannot remember, really, only remember that I spent too much time trying to make it behave and decided it isn't worth it. Anyway, faster is nice, but somehow I don't ever feel that ag is too slow for anything. The switch from the previous one (what was it? ack?) felt like a drastic improvement, but ag vs. rg wasn't much difference to me in practice.
Totally agree with the author of rg here. Config names should be unambiguous. Anyway, must have been something else, then. As I've said, I cannot remember what was the specific problem, only that it wasn't quite compatible with the workflow I was used to, and now it'd take another full-in attempt to switch to figure out what was so annoying to me back then.
I was just trying to remember why I switched _to_ rg.
It's nice and everything, but I remember being happy with the tools before (I think i moved from grep to ack, then jumped due to perf to ag and for unremembered reasons to pt.)
It took me a while, but I remembered I ran into an issue with pt incorrectly guessing the encoding of some files[0].
I can't remember whether rg suffered from the same issue or not, but I do know after switching to rg everything was plain sailing and I've been happy with it since.
For me it was trying to add a filter to search CMake files to ag and then realizing that the code had some rather stupid design decisions that prevented it. I wrote a pull request that fixed enough things to add the filter, got ignored by the maintainer and later realized that other people had already written the same filter and were ignored too.
One thing I learned over the years is that the closer my setup is to the default one, the better. I tried switching to the latest and greatest replacements, such as ack or ripgrep for grep, or httpie for curl, just to always return to the default options. Often, the return was caused by a frustration of not having the new tools installed on the random server I sshed to. It's probably just me being unable to persevere in keeping my environment customized, and I'm happy to see these alternative tools evolve and work for other people.
This sort of thing is a constant tension and it's highly likely to be a different optimum for every individual, but it's also important not to ignore genuine improvements for the sake of comfort/familiarity.
I suspect, in general, age has a fair amount to do with it (I certainly notice it in myself) but either way I think it's worth evaluating new things every so often.
Something like rg in specific can be really tricky to evaluate because it does basically the same thing as the builtin grep, but sometimes just being faster crosses a threshold where you can use it in ways you couldn't previously.
E.g. some kind of find as you type system, if it took 1s per letter it would be genuinely unusuable but 50ms might take it over the edge so now it's an option. Stuff like that.
I donāt understand when people typeset some name in verbatim, lowercase, but then have another name for the actual command. Thatās confusing to me.
Programmers are too enarmored with lower-case names. Why not Ripgrep? Then I can surmise that there might not be some program ripgrep(1) (there might be a shorter version), since using capital letters is not traditional for CLI programs.
> Stacked Git, StGit for short, is an application for managing Git commits as a stack of patches.
> ... The `stg` command line tool ...
Now, Iāve been puzzled in the past when inputing `stgit` doesnāt work. But here they call it StGit for short and the actual command is typeset in verbatim (stg(1) would have also worked).
Because we are constantly writing variables that are lowercase. Coming up with a name that is both short but immediately understandable is what we live for. Variables are our shrine, we stare at them everyday and are used to their beauty and simplicity.
How would you capitalise it? RipGrep? RIPGrep? Youād need to pick a side and lose the pun. (And of course grep itself would need to be GReP if we took it all the way)
This was my first thought as well. I think I end up just calling it nvim sometimes even conversationally, the binary name is the most "real" thing to me.
The fun part is it is pretty easy to ārewriteā ripgrep in rust, because burntsushi wrote it as a ton of crates which you can reuse. So you can reuse this to build your own with blackjack and hookers.
A "ton of crates" is IMO the best way to write large Rust programs. Each crate in Rust is a compilation unit, the equivalent of one `.c` file in C. If they don't depend on one another, each crate can be compiled in parallel. It makes building the whole project faster, often significantly so. As with anything one can take it too far, but as long as each crate makes sense as an independent unit it's good.
Isn't creating a bunch of crates pretty annoying, logistically (in terms of mandatory directory structure and metadata files per crate)? (Compared with C/C++ individual .c/.cpp files being compilation units.) And does parallel compilation of crates break LTO?
Gotta add a +1 for this. I wanted to do some ignore files etc for a project.
I thought "well I kinda want to do what rg does". Had a little glance and it was already nicely extracted into a separate crate that was a dream to use.
And burntsushi is one of us: he's regularly here on HN. Big thanks to him. As soon as rg came out I was building it on Linux. Now it ships stocks with Debian (since Bookworm? Don't remember): thanks, thanks and more thanks.
> > IMO, as long as the time differences remain small, I'm totally okay with ripgrep being slower by default on smaller corpora if it means being a lot faster by default on bigger corpora.
Note that this is the author of ripgrep replying to a third party commenter asking whether rg isnāt already lightweight, and comparing the two under various possible definitions of ālightweightā.
Any sysadmin worth their salt turns on extended regular expressions (with `-E` or `egrep`), which Ripgrep's regex syntax is a superset of, more or less.
> and incompatible with grep syntax, which makes it useless to most system admins
I wonder how much the above reflects a dated and/or stereotyped view? Who here works with "sysadmins"? I mean... devops-all-the-places now, right? :P Share your experiences?: I'm curious.
If I were a sysadmin, I'd have some kind of "sanity check" script if I had to manage a fleet of disparate systems. I'd use it automatically when I logon to a system. Checking things like: Linux vs BSD, what tools are installed, load, any weirdnesses, etc. Heck, maybe even create aliases that abstract over all of it, as much as possible. Maybe copy over a helix binary for editing too, if that was kosher.
It seems to me that `rg` is the number one most important part that enables LLMs to be smart agents in a codebase. Who would have thought that a code search tool would enable AGI?
Faster is not always the best thing. I still remember when vs code changed to ripgrep I had to change my habit using it, before then I can just open vs code to any folder and do something with it, even if the folder contains millions of small text files. It worked fine before, but then rg was picked, and it happily used all of my cpu cores scanning files, made me unable to do anything for awhile.
To be honest I hate all the new rust replacement tools, they introduce new behavior just for the sake of it, it's annoying.
One of my favorite moments in HN history was watching the authors of the various search tools decide on a common ".ignore" file as opposed to each having their own: https://news.ycombinator.com/item?id=12568245
I would argue that grep-like tools which read .gitignore violate the Principle of Least Astonishment (POLA). It would be fine if there were a --ignore flag to enable such functionality, but defaulting to it just feels wrong to me. Obviously smarter people than I disagree, but my dumdum head just feels that way.
> Obviously smarter people than I disagree, but my dumdum head just feels that way.
That's absolutely not it. What you're describing is part of the UNIX philosophy: programs should do one thing and do it well, and they should function in a way that makes them very versatile and composable, etc.
And that part of the philosophy works GREAT when everything follows another part of the philosophy: everything should be based on flat text files.
But for a number of reasons, and regardless of whatever we all think of those reasons, we live in a world that has a lot of stuff that is NOT the kind of flat text file grep was made for. Binary formats, minified JS, etc. And so to make the tool more practical on a modern *nix workstation, suddenly more people want defaults that are going to work on their flat text files and transparently ignore things like .git.
It's just that you've showed up to an wildly unprincipled world armed with principles.
Back in the day I would have agreed with you, but ever since there is js everywhere you end up with minified js that megabytes big and match everything. I still have muscle memory with `grep -r` and it almost always ends up with some js file, that I didn't know exists ruining the moment.
> Obviously smarter people than I disagree, but my dumdum head just feels that way.
No you are correct, do not doubt yourself. Baked in behavior catering to a completely separate tool is bad design. Git is the current version control software but its not the first nor last. Imagine if we move to another source control and are burdened with .gitignore files. No thanks.
The Unix tools are designed to be good and explicit at their individual jobs so they can be easily composed together to form more complex tools that cater to the task at hand.
That's why a bunch of tools agreed on something that was not tied to any one of them. What did I miss? We are explicitly not talking about .gitignore.
But both rg and ag ignore paths in .gitignore by default still
ah
> We are explicitly not talking about .gitignore.
Actually we are, because some utter idiot wrote:
> grep-like tools which read .gitignore violate POLA.
> utter idiot
How about we keep it civilized
Given that they were quoting themselves, I think it's probably intended as humor. (I had to scroll up to check but I immediately suspected, and it made me chuckle)
I have to agree here. I love ripgrep, but at times I've had to go back to regular grep because I couldn't figure out what it was ignoring and why, and there were far too many settings to figure it out.
FYI, `-uu` turns off both ignoring based on special files (.gitignore, etc) and ignoring hidden files.
And if you want to ignore what you want rg to ignore, not what you want git to ignore? Can you do that?
--no-ignore-vcs
Or some combination of --no-ignore (or -u/--unrestricted) with --ignore-file or --glob.
It's a tough one. Lately I've been doing `rg -u` every single time because too many things get ignored and I can't be bothered to figure out how to configure it more cleanly to do what I want by default.
ugrep agrees with you [1].
[1]: https://ugrep.com/
An `--ignore-file=` flag would be nice I guess:
--ignore-file=.ignore
--ignore-file=.gitignore
--ignore-file=.dockerignore
--ignore-file=.npmignore
etc
but then, assuming all those share the same "ignore file syntax/grammar"...
It's good if they can share syntax. You use the same English words to ask Alice and Bob questions, but when you say "So, tell me, Aliceā¦" you don't want Bob to answer you instead. Using another tool's config by default, making it difficult/impossible to use the dedicated config is the most annoying thing I can imagine. If that's what rg does, I guess that must be the reason I couldn't switch.
Agreed. It's a footgun.
Itās the kind of thing that maybe makes sense today. Less likely to make sense twenty years from now though.
But thatās the kind of problem that only successful things have to worry about.
It probably wouldn't have made sense twenty years ago (or 60 years ago, when IBM engineers first wrote about the Principle of Least Astonishment [1] in 1966).
But it does make sense today.
I'd argue that modern computers do many astonishing and complicated and confusing things - for example, they synchronize with cloud storage through complex on-demand mechanisms that present a file as being on the users' computer, but only actually download it when it's opened by the user - and they attempt to do so as a transparent abstraction of a real file on the disk. But if ripgrep tried to traverse your corporate Google Drive or Dropbox or Onedrive, users might be "astonished" when that abstraction breaks down and a minor rg query takes an hour and 800 GB of bandwidth.
It used to be that one polymath engineer could have a decent understanding of the whole pyramid of complexity, from semiconductors through spinlocks on to SQL servers. Now that goal is unachievable for most, and tools ought to be sophisticated enough to help the user with corner cases and gotchas that make their task more difficult than they expected it to be.
[1]: https://en.wikipedia.org/wiki/Principle_of_least_astonishmen...
It is already leaking today. I hope/expect that jujutsu takes over in the coming years.
Iāve read this multiple times over the years and this post is still the most interesting and informative piece describing the problem of making a fast grep-like tool. I love that it doesnāt just describe how ripgrep works but also how all the other tools work and then compares the various techniques. Itās simultaneously a tutorial and an expert deep dive. Just a beautiful piece of writing. In a perfect world, all code would be similarly documented.
The same author also wrote the defacto standard regex library in Rust, and somewhat recently a new time and date library (jiff).
The comparison between jiff, chrono, time and hifitime is just as good of a read in my opinion: https://github.com/BurntSushi/jiff/blob/HEAD/COMPARE.md
(And they have also written interesting things on regex, non-regex string matching, etc.)
He also wrote the rust CSV parsing library and then the command line utility xsv that uses it. It's an incredible piece of software for breaking down massive CSV files where fields contain linebreaks and you thus cannot use sed to just get ranges of lines.
Such a good read. I actually went back though it the other day to steal the searching for the least common byte idea out to speed up my search tool https://github.com/boyter/cs which when coupled with the simd upper lower search technique from fzf cut the wall clock runtime by a third.
There was this post from cursor https://cursor.com/blog/fast-regex-search today about building an index for agents due to them hitting a limit on ripgrep, but Iām not sure what codebase they are hitting that warrants it. Especially since they would have to be at 100-200 GB to be getting to 15s of runtime. Unless itās all matches that is.
Yeah, that Cursor blog post is a bit iffy since they just brush over the "ripgrep is slow on large monorepos", move on to techniques they used, and then completely ignore the fact that you have to build and maintain the index.
On a mid-size codebase, I fzf- and rg-ed through the code almost instantly, while watching my coworker's computer slow down to a crawl when Pycharm started reindexing the project.
I'm not into the low level minutiae but on large code bases I sometimes see a lag on the first rg:s I run and then it's fast, which I attribute to some OS level caching stuff.
Perhaps they run their software on operating or file systems that can't do it, or on hardware with different constraints than the workstation flavoured laptops I use.
Ripgrep is used as the defautl search backend for ArchiveBox, such a good tool. I was on ag (the-silver-searcher) for years before I switched, but haven't gone back since.
There's also RGA (ripgrep-all) which searches binary files like PDFs, ebooks, doc files: https://github.com/phiresky/ripgrep-all
When I first heard about ripgrep my reaction was laughing. grep had been too established. No way something that isn't 100% compatible with grep could get any traction.
And I was dead wrong. Overnight everyone uses rg (me included).
I don't know if this is coincidence or not but Cursor just made a post breaking down why they moved to their own solution in place or Ripgrep and it makes a lot of sense from a cursory (haha) read.
https://cursor.com/blog/fast-regex-search
I just got ripgrep ported to IRIX over the weekend.
Itās fast even on a 300mhz Octane.
Is IRIX experiencing a hobbyist revival or something? This is the second IRIX reference Iāve seen on here in the past two days, and there was a submission a day or two ago (c.f. a Voodoo video card?) as well. I havenāt personally encountered IRIX in the wild since a company I worked at in 2003. I suppose SGI has always had a cool factor but itās unusual seeing it come up in a cluster of mentions like this.
It ebbs and flows.
SGUG tried hard to port newer packages for IRIX for several years but hit a wall with ABI mismatches leading to GOT corruption. This prevented a lot of larger packages from working or even building.
I picked up the effort again after wondering if LLMs would help. I ran into the ABI problems pretty quickly. This time though, I had Claude use Ghidra to RE the IRIX runtime linker daemon, which gave the LLM enough to understand that the memory structures Iād been using in LLVM were all wrong. See https://github.com/unxmaal/mogrix/blob/main/rules/methods/ir... .
After cracking that mystery I was able to quickly get āimpossibleā packages building, like WebKit, QT5, and even small bits of Go and Rust.
Iām optimistic that weāll see more useful applications built for this cool old OS.
That is pretty neat. I guess this sort of unlocking and unblocking effort is exactly whatās needed for a revival.
Iām sort of thinking of AmigaOS/Workbench as well although, perhaps because of what I would assume was always a much larger user base than SGI had, it maybe never went away like SGI and IRIX did.
It is great seeing these old platforms get a new lease of life.
Ooh that's super interesting. I assume you shared the recipe with the irix community? I remember keeping Netscape up to date on my Indy was already a struggle in 2002.
I'd love to read a blog post on porting back to Irix!
I was using ripgrep once and it had a bug that led me downa terrifying rabbit hole - I can't recall what it was but it involved not being able to find text that absolutely should have been there.
Eventually I was considering rebuilding the machine completely but for some reason after a very long time digging deep into the rabbit hole I tried plain old grep and there was the data exactly where it should have been.
So it's such a vague story but it was a while back - I don't remember the specifics but I sure recall the panic.
I agree, it's a great tool with a catastrophically wrong default that silently and unpredictably catches people off-guard. I've tried using ripgrep many times but have been burnt too many times and can never trust its search to be comprehensive. It absolutely fails to find important stuff, and I can rarely predict whether the files it's going to skip intersect with my files of interest. And at this point I'm too burnt to care to pass flags to stop it from doing that. Which basically means I always run grep unless I know the number of matches beforehand and it's too large a directory to wait a few seconds for, in which case I run grep after rg fails to find it.
If it actually matched grep's contract with opt-in differences that'd be a gamechanger and actually let it become the default for people, but that ship seems to have sailed.
idk if this was your issue but Iām posting this because itās not obvious (especially the default behavior):
Was it confirmed to be a bug?
Sometimes I forget that some of the config files I have for CI in a project are under a dot directory, and therefore ignored by rg by default, so I have to repeat the search giving the path to that config files subdirectory if I want to see the results that are under that one (or use some extra flags for rg to not ignore dot directories other than .git)
Sorry I don't recall exactly but I don't think it was anything special like a hidden or binary file.
I still use it but Ive never trusted it fully since then I double check.
Was the file in a .gitignore by any chance? I've got my home folder in git to keep track of dot/config files and that always catches me out. Really dislike it defaulting to that ignoring files that are ignored by git.
In a repo, sure. On your own fs it feels like a footgun and every tool copies the behavior a little differently, which means you stop trusting the results and start wondering which files got skipped.
> Really dislike it defaulting to that ignoring files that are ignored by git.
It's the reason I started using it. Got sick of grep returning results from node_modules etc.
You started using it because it had that capability I imagine, not because it is the default. You could easily just alias a command with the right flag if the capability was opt-in.
No, because it was default.
> You could easily just alias a command with the right flag if the capability was opt-in.
I tried a search to make grep ignore .gitignore because `--exclude=...` got tedious and there was ripgrep to answer my prayers.
Maintaining an alias would be more work than just `rg 'regex' .venv` (which is tab-completed after `.v`) the few times I'm looking for something in there. I like to keep my aliases clean and not have to use rg-all to turn off the setting I turned on. Like in your case, `alias rg='rg -u'`, now how do you turn it off?
I had that happen too recently⦠Basically rg x would show nothing but grep -r x showed the lines for any x. Tried multiple times with different x, then I kept using grep -r at that time. After a few days, I started using rg again and it worked fine but now I tend to use grep -r occasionally too to make sure.
Next time that happens try looking at the paths, adding a pair of -u, or running with --debug: by default rg will ignore files which are hidden (dotfiles) or excluded by ignore files (.gitignore, .ignore, ā¦).
See https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md#a... for the details.
I use "grep" to search files (it should never skip any unless I tell it to do otherwise) and "git grep" to be a programmer searching a codebase (where it should only look at code files unless I tell it to do otherwise). Two different hats.
I wouldn't want to use tools that straddle the two, unless they had a nice clear way of picking one or the other. ripgrep does have "--no-ignore", though I would prefer -a / --all (one could make their own with alias rga='rg --no-ignore')
Maybe related to text encodings?
I think riggrep will not search UTF-16 files by default. I had some such issue once at least.
Could have been an incorrectly inferred encoding scheme?
I ran into that with pt, and it definitely made me think I was going mad[0]. I can't fully remember if rg suffered from the same issue or not.
[0] https://github.com/monochromegane/the_platinum_searcher/issu...
When Claude Code uses `grep` it's actually using `rg` underneath
Codex does as well, although I think it actually shows it running `rg`.
Oh, interesting! I had a user prompt that suggest using rg not grep, but was annoyed that it uses rg.
I don't remember why I didn't switch from ag, but I remember it was a conscious decision. I think it had something to do with configuration, rg using implicit '.ignore' file (a super-generic name instead of a proper tool-specific config) or even .gitignore, or something else very much unwarranted, that made it annoying to use. Cannot remember, really, only remember that I spent too much time trying to make it behave and decided it isn't worth it. Anyway, faster is nice, but somehow I don't ever feel that ag is too slow for anything. The switch from the previous one (what was it? ack?) felt like a drastic improvement, but ag vs. rg wasn't much difference to me in practice.
> I didn't switch from ag, [...] rg using implicit '.ignore' file (a super-generic name instead of a proper tool-specific config)
The ".ignore" name was actually suggested by the author of ag (whereas the author of rg thought it was too generic): https://news.ycombinator.com/item?id=12568245
Totally agree with the author of rg here. Config names should be unambiguous. Anyway, must have been something else, then. As I've said, I cannot remember what was the specific problem, only that it wasn't quite compatible with the workflow I was used to, and now it'd take another full-in attempt to switch to figure out what was so annoying to me back then.
If you ever do, please reply to this with why.
I was just trying to remember why I switched _to_ rg.
It's nice and everything, but I remember being happy with the tools before (I think i moved from grep to ack, then jumped due to perf to ag and for unremembered reasons to pt.)
It took me a while, but I remembered I ran into an issue with pt incorrectly guessing the encoding of some files[0].
I can't remember whether rg suffered from the same issue or not, but I do know after switching to rg everything was plain sailing and I've been happy with it since.
[0] https://github.com/monochromegane/the_platinum_searcher/issu...
For me it was trying to add a filter to search CMake files to ag and then realizing that the code had some rather stupid design decisions that prevented it. I wrote a pull request that fixed enough things to add the filter, got ignored by the maintainer and later realized that other people had already written the same filter and were ignored too.
One thing I learned over the years is that the closer my setup is to the default one, the better. I tried switching to the latest and greatest replacements, such as ack or ripgrep for grep, or httpie for curl, just to always return to the default options. Often, the return was caused by a frustration of not having the new tools installed on the random server I sshed to. It's probably just me being unable to persevere in keeping my environment customized, and I'm happy to see these alternative tools evolve and work for other people.
This sort of thing is a constant tension and it's highly likely to be a different optimum for every individual, but it's also important not to ignore genuine improvements for the sake of comfort/familiarity.
I suspect, in general, age has a fair amount to do with it (I certainly notice it in myself) but either way I think it's worth evaluating new things every so often.
Something like rg in specific can be really tricky to evaluate because it does basically the same thing as the builtin grep, but sometimes just being faster crosses a threshold where you can use it in ways you couldn't previously.
E.g. some kind of find as you type system, if it took 1s per letter it would be genuinely unusuable but 50ms might take it over the edge so now it's an option. Stuff like that.
Story of my life basically. It is just too much effort to keep customization preserved.
nowgrep is supposedly even faster than ripgrep:
https://x.com/CharlieMQV/status/1972647630653227054
Closed-source, no-release. Why bother linking it?
*on windows as the tweet says
They discovered the nowgorithm.
Thatās because it doesnāt do the same work. Itās not an equivalent tool to grep.
(2016)
Maybe the first rust "killer app" in retrospect.
Itās a pure delight to read this docs / pitch.
qgrep is faster if you're fine with indexing. worth it
Doesn't ripgrep support indexing too?
not yet, but Cursor is working on it.
codex is basically a ripgrep wrapper at this point :)
fd:find::rg:grep
Someone please make an awesome new sed and awk.
sd exists, dunno but awk.
https://github.com/chmln/sd
How is GNU / BSD sed and awk not awesome?
Limited regex capabilities. A fast tool that does what sed and awk do with PCRE2 would be amazing.
> The binary name for `ripgrep` is `rg`.
I donāt understand when people typeset some name in verbatim, lowercase, but then have another name for the actual command. Thatās confusing to me.
Programmers are too enarmored with lower-case names. Why not Ripgrep? Then I can surmise that there might not be some program ripgrep(1) (there might be a shorter version), since using capital letters is not traditional for CLI programs.
Look at Stacked Git:
https://stacked-git.github.io/
> Stacked Git, StGit for short, is an application for managing Git commits as a stack of patches.
> ... The `stg` command line tool ...
Now, Iāve been puzzled in the past when inputing `stgit` doesnāt work. But here they call it StGit for short and the actual command is typeset in verbatim (stg(1) would have also worked).
Because we are constantly writing variables that are lowercase. Coming up with a name that is both short but immediately understandable is what we live for. Variables are our shrine, we stare at them everyday and are used to their beauty and simplicity.
How would you capitalise it? RipGrep? RIPGrep? Youād need to pick a side and lose the pun. (And of course grep itself would need to be GReP if we took it all the way)
I wrote Ripgrep.
And they wrote "... you'd need to pick a side and lose the pun.."
And I am able to read four sentences.
Itās only 2 characters - if you use it all the time it becomes muscle memory.
You can simply add a shell alias with whatever name you like and move on.
True, but easier said than done, because one often need to work in more shells than their local machines..
This is a nonstandard tool. If you can't customize your machine, you already don't have it.
But it could be one day..
Do something like this to fall back to plain grep. You will somehow have to share these configurations across machines though.
You can't in most corporate env machines.
You may be able to download ripgrep, and execute it (!), but god forbid you can create an alias in your shell in a persistant manner.
huh? If you can download and execute files, you can alias it. Either in your .bashrc file, or by making a symlink.
I daily drive linux, but I hop from clients to clients and I have probably served about 200 different structures so far.
Most corporate machines are Windows boxes with ps and cmd.exe heavily restricted, no admin, and anti malware software surveilling I/O like a hawk.
You might get a git bash if you are lucky, but it's usually so slow it's completely unusable.
In one client I once tried to sneak in Clink. Flagged instantly by security and reported to HR.
It's easy to forget that life outside the HN bubble is still stuck there.
`[citation needed]`
> You can't in most corporate env machines.
Really? "most" even? What CAN you do if you can't edit files in your own $HOME?
Don't get me started on `nvim` to run neovim...
This was my first thought as well. I think I end up just calling it nvim sometimes even conversationally, the binary name is the most "real" thing to me.
Is it still?
still a good read
Hasnāt someone rewritten ripgrep in rust by now? Cāmon itās 2026. Oh wait it was written in Rust (back in 2016).
The fun part is it is pretty easy to ārewriteā ripgrep in rust, because burntsushi wrote it as a ton of crates which you can reuse. So you can reuse this to build your own with blackjack and hookers.
A "ton of crates" is IMO the best way to write large Rust programs. Each crate in Rust is a compilation unit, the equivalent of one `.c` file in C. If they don't depend on one another, each crate can be compiled in parallel. It makes building the whole project faster, often significantly so. As with anything one can take it too far, but as long as each crate makes sense as an independent unit it's good.
Isn't creating a bunch of crates pretty annoying, logistically (in terms of mandatory directory structure and metadata files per crate)? (Compared with C/C++ individual .c/.cpp files being compilation units.) And does parallel compilation of crates break LTO?
Gotta add a +1 for this. I wanted to do some ignore files etc for a project.
I thought "well I kinda want to do what rg does". Had a little glance and it was already nicely extracted into a separate crate that was a dream to use.
Thanks @BurntSushi!
Waiting for the zig port
https://github.com/alexpasmantier/grip-grab
Someone kinda did
looks abandoned. last commit was 2 years ago.
And burntsushi is one of us: he's regularly here on HN. Big thanks to him. As soon as rg came out I was building it on Linux. Now it ships stocks with Debian (since Bookworm? Don't remember): thanks, thanks and more thanks.
Big thanks to him indeed (and for other projects in Rust space as well).
// really hoping openai wouldn't now force him to work on some crappy codex stuff if he stays there / in astral.
ugrep is my daily driver. https://ugrep.com
The TUI is great, and approximate matches are insanely useful.
There is also upgrep, which is quite a good project. https://github.com/Genivia/ugrep
This is what my default grep is. You probably want to update your comment to reflect the correct name. (ugrep)
I recall that it was also very close in performance. When it was posted here, it beat ripgrep! https://news.ycombinator.com/item?id=38819262
(2024) gg: A fast, more lightweight ripgrep alternative for daily use cases
https://reddit.com/r/rust/comments/1fvzfnb/gg_a_fast_more_li...
> > IMO, as long as the time differences remain small, I'm totally okay with ripgrep being slower by default on smaller corpora if it means being a lot faster by default on bigger corpora.
Also something-something about dependencies (a Rust staple): https://www.reddit.com/r/rust/comments/1fvzfnb/gg_a_fast_mor...
Note that this is the author of ripgrep replying to a third party commenter asking whether rg isnāt already lightweight, and comparing the two under various possible definitions of ālightweightā.
Yes.
and incompatible with grep syntax, which makes it useless to most system admins
Any sysadmin worth their salt turns on extended regular expressions (with `-E` or `egrep`), which Ripgrep's regex syntax is a superset of, more or less.
> and incompatible with grep syntax, which makes it useless to most system admins
I wonder how much the above reflects a dated and/or stereotyped view? Who here works with "sysadmins"? I mean... devops-all-the-places now, right? :P Share your experiences?: I'm curious.
If I were a sysadmin, I'd have some kind of "sanity check" script if I had to manage a fleet of disparate systems. I'd use it automatically when I logon to a system. Checking things like: Linux vs BSD, what tools are installed, load, any weirdnesses, etc. Heck, maybe even create aliases that abstract over all of it, as much as possible. Maybe copy over a helix binary for editing too, if that was kosher.
It seems to me that `rg` is the number one most important part that enables LLMs to be smart agents in a codebase. Who would have thought that a code search tool would enable AGI?
Faster is not always the best thing. I still remember when vs code changed to ripgrep I had to change my habit using it, before then I can just open vs code to any folder and do something with it, even if the folder contains millions of small text files. It worked fine before, but then rg was picked, and it happily used all of my cpu cores scanning files, made me unable to do anything for awhile.
To be honest I hate all the new rust replacement tools, they introduce new behavior just for the sake of it, it's annoying.