Google copybara: moving code between repositories

(github.com)

283 points | by reconnecting 18 hours ago ago

56 comments

  • MarkSweep 16 hours ago

    Some other interesting tools in the space. Rust is using a tool called Josh to sync commits:

    https://josh-project.dev

    The blog post from the Rust people:

    https://blog.rust-lang.org/inside-rust/2026/06/04/how-josh-h...

    Meta used to have an open source tool called fbshipit. But according to its open source repo they no longer use it:

    https://github.com/facebookarchive/fbshipit

    Any others in this space?

  • dijit 7 hours ago

    Ha! This post comes right at the time where I finally got around to open sourcing the patches I made to provide Perforce support which I use at my gamedev studio[0].

    I find it kind of funny that perforce support was not included, only git support seems to exist in any meaningful way: despite the primary use of copybara being for releasing internal google code (which lives in Piper, a fork of Perforce).

    I actually got a bit worried when looking at the git history before making the PR, because there's a lot of Gerrit Change-ID markers (meaning there's some gerrit code review system somewhere which I'm not privvy too) and I might have submitted a PR which never gets upstreamed..

    I feel the same pang of pain from the lack of a perforce version of Gerrit/Rietveld.. but, you can't ask for everything!

    [0]: https://github.com/google/copybara/pull/347

    • bruckie 5 hours ago

      One reason there's no Perforce support for Gerrit/Rietveld is that Google doesn't use those tools for changes to code that's stored in Piper. Instead, they use Critique:

      - https://abseil.io/resources/swe-book/html/ch19.html

      - https://read.engineerscodex.com/p/how-google-takes-the-pain-...

      I haven't found anything external that's as good, and am astounded that GitHub's incredibly lackluster PR review tooling is acceptable to most people. If anyone is aware of something in Critique's league, I'd love to hear about it!

      Edit: I tried to do a reasonably thorough survey a couple of years ago when I left Google and https://codeapprove.com/ was the closest I found, but there were still many gaps.

      • lima 2 hours ago

        Gerrit's new UI comes pretty close, and it's also what Google uses for projects living outside of google3.

        GitHub's PR review workflow is archaic in comparison, especially now that every page takes 2-3 seconds to load.

    • Arainach 6 hours ago

      > which lives in Piper, a fork of Perforce

      This is incorrect. Piper is API-compatible with Perforce but is a totally different implementation, not a fork.

    • lima 7 hours ago

      Don't worry, they accept PRs on that repo. They just merge them internally and then re-export them.

      There are some variations, but this is generally the same with all open source projects which live in their internal monorepo, such as gVisor or Bazel.

  • schrodinger 16 hours ago

    To those who have used it: is it handy for situations where you have multiple repos that want to share a little code, but it's not worth the trouble of extracting a library, referencing it, publishing versioned releases, updating dependent repos, etc?

    And instead just "sync" a code folder from one main repo (perhaps containing common domain models) to other repos?

    Basically the Go philosophy that a little bit of copying is better than a lot of dependency?

    • ASinclair 16 hours ago

      It’s largely used for syncing external open source projects with the monorepo. Policy is to require source code imports over built artifacts. Though you can get exceptions.

      Some projects are also developed in the monorepo and exported via Copybara.

      My team also uses it to version Starlark rule sets internally.

      • lwhi 8 hours ago

        I suppose it mitigates the potential risk of libraries being poisoned?

        • baliex 7 hours ago

          Well kind of, or you just end up copying the poisoned version directly into your repo rather than having it as a dependency. Same outcome.

          I suppose if you're running some security analysis on code in your own repo, the fact that you've copied the code in means that it'll run on your third party dependencies too, since they no longer appear to be third party.

      • paulddraper 15 hours ago

        Source code imports versus artifacts really neither here nor there. Go is source code imports too.

        The key part for Copybara is that Google will make changes to the OSS projects from within the internal repo and everyone else will make changes to the OSS projects.

    • xyzzy_plugh 16 hours ago

      It's for when you have a monorepo internally, and want to publish parts of it as open source to the world. They still need to live in the monorepo, so this is the solution.

      Having a public repo as a dependency for your private corporate repo is a pain in the ass development-wise. Having a tree of such dependencies is a migraine.

      • fipar 13 hours ago

        It can also be used if you want part of your monorepo to track something open source from the world.

        Say, to rebase upstream MySQL changes onto a fork in the monorepo (in a random, non-specific example)

        • yaskou 11 hours ago

          Yeah, that's the fun part. Probably built first for exporting monolith slices to OSS, but the reverse direction is more interesting to me. Tracking an upstream or keeping a private fork in sync. That's what makes Copybara useful well beyond the monorepo use case.

    • klodolph 13 hours ago

      Copybara can do that but I think it will be annoying and tedious to use it that way. More annoying than the problem of extracting a library or shoving some files in a separate repo.

  • klodolph 13 hours ago

    Been using this for a while, mostly when I make a tool as part of a larger project and the tool is big enough to deserve its own release.

    It’s powerful enough to do a whole bidirectional shipping operation where you export and import code—no thanks, that’s a hassle. I use it mostly for a simple fire and forget export, where I take a folder out of its original repo and preserve the history. Then I just move development to the new repo. The new project layout can be completely different, but Git blame works and I’m happy with that.

    • rnagulapalle 13 hours ago

      The one-way pattern is actually how Google uses it internally too, syncing outward from their monorepo to GitHub. Bidirectional gets messy because transforms (path remapping, file exclusions, header stripping) are easy to apply in one direction but can't always be cleanly inverted. When both sides have diverged, Copybara's baseline tracking starts producing confusing results because semantically equivalent commits generate different SHAs after transform.

      One thing worth knowing: history "preservation" is actually cherry-picks with rewritten commits, not a true transplant. Git blame works because the file content and authorship carry over, but the SHAs are new. Copybara embeds the original SHA in a commit message trailer (GitOrigin-RevId), which is useful to know if you ever need to correlate commits across repos after the fact.

      • madeofpalk 10 hours ago

        > The one-way pattern is actually how Google uses it internally too, syncing outward from their monorepo to GitHub

        Do they not support contributions on the public repos back into the internal monorepo?

        • dmoy 10 hours ago

          There are three ways I've seen it done, though it being Google I assume there's more

          One is to try the bidirectional support with copybara itself, thought that usually requires more effort than it's worth.

          Another is to have the external repo be the source of truth and then always import into google3. Kythe used to do this at least, though I gather it's not done that way anymore.

          The third is to just replicate the patches externally (which is pretty easy to automate or semi-automate on a case by case basis), and verify that a re-copybara-export keeps zero diff

        • eddd-ddde 4 hours ago

          The "supported" workflow is you keep your source of truth in either the monorepo or the external repo. Then you export the current state of the source of truth to keep the mirrors up to date. Then, since we can assume the mirrors are up to date, the inverse transform can be applied to import change requests from the mirrors.

          It works well when the assumptions hold, that there isn't large divergence on either side. It can actually be largely automated.

  • cole_ 3 hours ago

    We used Copybara to setup a hub-and-spoke model at Dagster where public-facing repos can live within our larger internal monorepo, and while it worked we had to do a lot of hacky things.

    https://dagster.io/blog/monorepos-the-hub-and-spoke-model-an...

  • theodpHN 11 hours ago

    52+ years of progress... :-)

    July, 2026: Google copybara allows one to move code between two prod repositories

    March, 1974: IBM COPY allows one to move code between two prod partitioned data sets: OS/MVT and 0S/VS2 TSO Data Utilities COPY, FORMAT, LIST, MERGE User's Guide and Reference https://www.computinghistory.org.uk/downloads/8987

  • DisagreeAndQuit 2 hours ago

    I work at Google and copybara workflows are the bane of my existence.

  • the_dude_ 11 hours ago

    If the only need you need is sync repos without exclusions or transformations I wouldn't bother, it could work for you until it doesn't when they archive it or kill it like kaniko or so many other google products/tools.

    Gitlab has really simple way to mirror from Gitlab to Github or other git vendors/servers

    • vlovich123 10 hours ago

      I really doubt copybara ever gets killed. AFAIK it’s a pretty central tool to google3 and how they maintain and vendor OSS projects at scale

    • dmoy 10 hours ago

      yea copybara is really for doing transforms in one or both directions.

      E.g. translating from external bzl to internal blaze BUILD compatibility, changing between external imports and internal third_party style imports, etc etc.

      If it's a pure mirror, copybara is super overkill

  • namanyayg 17 hours ago

    Nice, I built something similar ~5 years ago using nested git repos and scripts to accomplish a similar purpose of combined private and public repos.

    My shell script definitely wasn't google scale tho!

    • UnfitFootprint 17 hours ago

      Yep, same. I thought it might a wrapper around git subtree but looks like it’s doing quite a lot more!

      For example altering commit author emails during sync

  • nbobko 8 hours ago

    At my previous company we tried to use this tool to sync parts of the code between two different git repos. The tool turned out being unacceptably slow.

    Handwritten bash scripts using git-replace and git-filter-repo [1] did a much better job

    [1]: https://github.com/newren/git-filter-repo

  • alok-g 15 hours ago

    Interesting. Anyone knows how this compares to using git submodules and subtrees?

    I had used those to create separate repo for website artifacts while the same also remain plugged into the webapp dev repo. (Both sides remain modifiable and changes mergeable to the other side.)

    Thx.

    • eddd-ddde 4 hours ago

      The main benefits you get are transformations. You can leverage tooling to automate include remapping and things like that.

      But it's definitely not geared towards forks, but rather mirrors with deterministic and invertible transforms.

  • xyzzy_plugh 16 hours ago

    Copybara is one of those things that you should have set up yesterday.

    It works great and I've seen many teams gain significant productivity when collaborating in a monorepo with public bits.

    If you're even toying with an internal monorepo you owe it to yourself to give it a try.

  • jumploops 15 hours ago

    We’re in the process of open-sourcing a few sub-projects within a monorepo, and didn’t know this existed!

    I’m curious what downsides folks have experienced with this tool?

    Any tips?

    • veyh 11 hours ago

      If you're exporting more than a few commits, I suggest using local repos (/path/to/.git) for both the source and destination. Otherwise it'll be quite slow.

  • Gehinnn 9 hours ago

    Does this tool allow changes in both repositories? (with a 3 way merge strategy) git subtrees come close, but I have a use case where I need transformations/file filters on top.

  • ape4 4 hours ago

    Why doesn't it have a capybara as its logo!

  • whirlwin 8 hours ago

    Does anyone know if it is useful for bidirectional sync between codeberg and GitHub?

  • theolivenbaum 9 hours ago

    Wild times when one can go from a HN post about an interesting open source code to a port to a new language in a matter of hours (wip, but almost complete: https://github.com/theolivenbaum/copybara)

  • bellowsgulch 2 hours ago

    I already shared git-fetch-file in another comment here, but the .git-remote-files manifests seem a lot nicer than whatever this thing does:

        [file "lib/util.py" from "https://github.com/example/tools.git"]
        commit = a1b2c3d4e5f6789abcdef0123456789abcdef01
        branch = master
        comment = Common utility function
    
        [file "config.json" from "https://github.com/example/tools.git"]
        commit = b2c3d4e5f6789abcdef0123456789abcdef012
        branch = master
        target = vendor
        comment = Configuration from tools repo
    
        [file "helper.js" from "https://github.com/another/project.git"]
        commit = c3d4e5f6789abcdef0123456789abcdef0123
        branch = main
        comment = Helper from another project
    
    vs

        core.workflow(
            name = "default",
            origin = git.github_origin(
            url = "https://github.com/google/copybara.git",
            ref = "master",
            ),
            destination = git.destination(
                url = "file:///tmp/foo",
            ),
    
            # Copy everything but don't remove a README_INTERNAL.txt file if it exists.
            destination_files = glob(["third_party/copybara/**"], exclude = ["README_INTERNAL.txt"]),
    
            authoring = authoring.pass_thru("Default email <default@default.com>"),
            transformations = [
                core.replace(
                        before = "//third_party/bazel/bashunit",
                        after = "//another/path:bashunit",
                        paths = glob(["**/BUILD"])),
                core.move("", "third_party/copybara")
            ],
        )
    
    There seems to be an absolute ton of reference at https://github.com/google/copybara/blob/master/docs/referenc..., whereas I feel like all the people using git-fetch-file just want files from other repos, and sometimes to make some changes on those.
  • neprotivo 13 hours ago

    If you are using Jujutsu you can achieve a basic way to maintain a public repo from a private monorepo with very little code and without Copybara. I wrote up how to do it here: https://vihren.dev/blog/20260625-jj-public-private-workflow/

    • j2kun 13 hours ago

      The main function of copybara is not moving code but modifying code to make it suitable for a different repository structure, build system, etc.

  • willchen 12 hours ago

    i used this tool when i was at google, extremely helpful in open-sourcing things from google3 to github. still, i'm glad to just directly develop on github now :)

  • bellowsgulch 4 hours ago

    Reminds me of git fetch-file.

    “Fetch and sync individual files or globs from other Git repositories, with commit tracking and local-change protection”

    https://github.com/andrewmcwattersandco/git-fetch-file

  • syngrog66 15 hours ago

    That seems like a tool easily adoptable by folks engaging in dark patterns on GitHub, particularly the malware bait repos.

  • lysace 17 hours ago

    Cute name. (Naming is hard and important.)

    • whh 17 hours ago

      That tune is in my head again... again...

  • krick 8 hours ago

    I get it that there are use-cases for this, but it's surprising to learn that apparently use-case space is big enough for it to invite a creation of a dedicated tool. I mean that the fact you need it is a bit shameful on its own, no? Usually, when you need to reuse the code between the projects, you try to extract it as a separate library / module. The copying between repos is just a lazy solution, because "ain't nobody got time for that".

    • zem 7 hours ago

      it's useful for cases like google's, where they mirror internal code to github or vice versa, and the two versions need a bit of mechanical work every time they are synced (e.g. slightly different tree layout conventions, internal code or docs that you don't want to include in the github version, stripping of references to other internal stuff like bug IDs from comments, etc).

      • krick 3 hours ago

        But if you are gonna extract and open-source the whole self-contained tool, why not just do that and then install in whatever project like you install any other 3rd party tool?

        • zem a few seconds ago

          in order to use third party code within google it needs to be copied into the monorepo, and projects that depend on it need to import the internal version. this requires at the very least setting it up to build via blaze (google's internal build system, open sourced as bazel), and frequently adjusting its directory layout to fit the monorepo conventions.

          this "mirror and use the local copy" dance is exactly how "any other third party tool" works within google.