Rust in Large Organizations Notes, Hacker News


  • FB dev env – backend services repo – is mostly C and Java. Very polyglot environment. Glued together with Buck, FB’s Bazel.

    • Buck: Language agnostic. Supports Rust.
    • rustc drops in quite nicely, basically equivalent to C compiler.
    • wanted to use cargo but it just does too much to fit in
    • need to delineate parts of cargo that are desired with those that conflict with Buck
    • ecosys is big advantage for Rust but hard to separate from cargo
    • current scheme:
      • big cargo.toml including all the things used in internal repo
      • cargo builds artifacts that are presented to buck
      • buck can link against those
      • reasonably successful
      • but approaching 700 crates in transitive dep graph, getting very cumbersome to rebuild etc
      • plus pinned to a specific version of compiler (prebuilt artifacts)
      • works ok but build.rs build scripts are a big complication
    • specific cargo pain points:
      • build scripts
      • “features” feature
      • a lot of crates don’t use features the way they’re intended – they’re used for exclusive A or B choices
        • this creates the possibility to break the build
        • need some sort of “cfg” feature that represents forks of a crate
  • Google does a similar thing for fuschia

    • cargo builds 3rd party artifacts, normal build consumes those
    • problems:
      • handful of 3rd party artifacts depend on things built in tree
      • want to be able to do partial builds, e.g. w / o a feature, or just for some targets
      • developing for a new OS, so we compile some code for host, some for target
      • presently do 2 full builds, but it’s a pain
      • don’t have as much control over the flags getting passed to rustc as we’d like
      • dep flags linker flags aren’t as specific as we need to distribute deps that are needed for indiv targets
      • prototype using “cargo raise” (use “gn” (from Chrome) to generate ninja files)
      • based on a modification of cargo raise that generates bazel build files
        • has its own handling of build.rs stuff
        • rather than outputting build files, it outputs a json format that could be the basis for the proposed “cargo build plans” feature
          • would be good to know what inputs etc are needed, how this would fit for Buck
          • can Buck consume internal files?
    • gn is aware of the concept of a Rust target
  • cumulo build system:

    • doesn’t use cargo, invokes rustc directly
    • cargo just builds json
    • build all deps as shared libraries, whether or not they want that
      • .solibraries,. rmetafiles
      • hits a lot of problems
    • ran into problems, notably lack of support for build.rs – have to reimpl cargo
    • building for 2 different targets
      • have own platform
      • linux target for procedural macros
    • need sometimes to pass flags that are target specific, build a target config map
    • would prefer to use cargo
  • does cargo raise support build.rs?

    • has some builtin support for build.rs?
    • not automatic: you declare purpose of build.rs
    • things that do rustc version detection?
    • sometimes you want to (e.g.) disable build.rs that supply native deps which come from bazel
  • why can’t you run build.rs as part of the build tool?

    • fundamental problems:
      • no declared inputs, no declared outputs
      • buck / bazel etc has to know what files the build script is consuming, producing etc
    • also, they are arbitrary execution, which can be a security concern
      • proc macros have some similar concerns.
        • e.g., pest which looks at cargo source dir env variable and finds your grammar def’n file
          • doesn’t fit well
  • one thing that was discussed years ago:

    • capability system for build.rs that restrict what scripts can do
    • e.g., read from this directory, write to that one
    • cargo can then audit / sandbox to enforce said rules
      • run build script in a sandbox
        • e.g. crossvm has an impl of this inside of chrome; all crossvm devices run in their own jail
      • nontrivial engineering effort
    • could do at a higher level, sandbox
  • jeremy: build scripts classified into 3 or 4 distinct types, is this complete?

    • doing codegen. read a file, bindgen, etc
    • gateway to some other library, using pkgconfig or something to find the library, or they build it from source
    • feature detection on rustc
    • “scary ones” – database reads, timestamps
  • plausibly could address those use cases in other ways

    • feature detection is an obvious one, e.g. we had an rfc for compiler versions
  • version compat is a common thing

  • what version of rust are people using?

    • stable
    • “** Stableish” – Bootstrap Nightly

  • who here is using toolchains distributed by rust?

    • ms (partially), mozilla, libra
  • why a custom toolchain?

    • config.toml tweaks
      • use clang’s version of some unwinding code
      • custom linker
      • panic=abort
    • custom targets
    • compliance reasons (wanting to build from source for security reasons)
  • bootstrapping compliance

    • where to get initial rust version?
    • several attempts:
      • most successful is using mrustc at version 1. 22 and building from there
      • ms, google did that
    • is there a possibility of long term drift?
      • builds are notquitereproducible at present, but almost
    • was a point where build w / mrustc build with toolchain had non-matching hashes
      • might have to tweak the paths
      • in principle it can be done, should maybe prioritize it
  • maybe have an approved “how to bootstrap from C” documentation

  • specific reason fb builds from source:

    • want to always have the option to apply a local patch
    • don’t want to get stuck with a “we must have this patch yesterday” scenario and have to figure out how to apply patch then
  • in most cases, also building llvm, want to share llvm for cross-lang LTO

    • must have a newer LLVM than what rust ships with
  • some folks have cross-lang LTO working

    • but rustc doesn’t want to produce bitcode files
    • pass the linker/ bin / echo
  • pgo – coming soon

  • fb uses after the fact binary rewriting

  • splitting out linker was a potential change to rustc or cargo that google wants

  • would be interesting to know “here is what must be passed to gcc to successfully link”

  • another option: give a python script as the linker

    • turns out servo does it, too
  • show of hands survey:

    • “who is interested in a common backend for ‘those things'”
      • nobody knows what that means
  • buck needs a “fully specified dep dag”, seems like a common thing for other build systems

    • seems like we have to do a few cases to work out the general rules first
  • rudimentary cargo build plan support:

    • gives a dag of rustc executions
    • but it’s too low level for buck, also bazel
  • pressure: every once in a while people propose “rewriting cargo.toml” into the tree

    • so far resisted that
    • a possible outcome buck has thought of:
      • buck support for cargo.toml
      • ton of code that’s open source for people (natch) don’t want to build w / buck out of tree
      • want ability to simultaneously maintain buck / cargo support
      • currently done by hand and horrible
      • internally even people want this for mac / win builds which buck doesn’t support
      • google w / gn does something similar, keeps cargo.toml in order to upstream it
        • in some cases can generate a cargo.toml file programatically
        • also imp’t for IDE support
  • IDE support

    • RLS kind of working with buck
    • knowing laughter:)
    • problematic assumptions: e.g., searching the filesystem for cargo.toml, but it’s millions of files
    • symptom of a larger thing
      • cargo is designed for managing rust code
      • assumes source tree is mostly rust code
      • but often rust is embedded in a large source tree with tons of non-rust
        • so having some “root for all rust code” where you search below is problematic
      • top-level directory not gonna work
        • always having to create artificial “root” directories
    • rust-analyzer avoids this by not baking cargo in as deeply
      • but still has this “top level directory” model that contains all the rust code which means a small amount of rust amongst everything else
  • generating a cargo.toml for 1 project works well, but when you have multiple targets that interact

  • cumulo has a ton of C and Rust code that must be all combined into one big final artifact

    • IDE support that avoids cargo is a must
    • current state of the art: ctags
  • cramertj: cargo.toml is basically the intermediate repr for specifying deps

    • are there other things one might want?
    • build system has its own custom language to do that description
      • can use that to generate cargo.toml files though for IDE etc
        • what changes might one want in a “non-cargo IDE language”?
          • maybe cargo would work fine
  • manish: does this also cause problems for clippy and rustfmt?

    • cargo.toml is also useful for this
  • who uses clippy? most folks

  • rustfmt? most folks

    • fb invokes it on individual files for that
  • libra uses cargo to build

    • “cacheability” (sccache) has gotten worse over time
    • procedural macros aren’t getting cached (dylibs)
    • are other people doing anything with this?
    • ff has a distributed cache in the office
    • (buck does caching of everything)
      • native deps? also integrated into buck
      • assume that if a C dep changes, rust must be rebuilt?
      • - lnativeis not very well-scoped (just to a directory, not specific libs)
      • problem: can’t cache link steps as a result
      • maybe also part of the problem with sccache
      • in buck, each lib gets its own directory, sidestepping this problem
  • linker want:

    • ability to specify a specific mapping from link name to the native library
    • option to ignore link directories or transform
    • in buck case, if you have a dep on a native library, you get two options (- lfooand full path to foo)
  • crate features, misuse thereof:

    • people seem to want option to have mutually exclusive features
    • want to have impls clone etc for testing but not in a release build
    • hacked up something using cargo features but doesn’t work all the time
    • problems:
      • dev dependencyfoowith feature “testing”
      • sometimes testing gets turned on semi-randomly (???)
      • but you can also accidentally use “testing” in a normal tree
    • deps for build scripts leak through to the real graph, perhaps part of the “semi-random” behavior
  • designing from the wrong direction, perhaps?

    • a lot of requirements coming up that are “above and beyond” existing cargo spec and design
    • contra: goal is to have cargo co-exist with buck / bazel / etc, these are the features needed for that?
  • do we want to build another tool that is not cargo?

    • but everybody already has a tool and wants to use it
    • but how can we do minimal work so that integration of cargo these other tools is smoother
      • working with rest of rust ecosys
  • de facto standard that crates.io cargo have created

    • defined entirely by impl of cargo
    • only access at present is through cargo’s impl
    • refactoring cargo into indep chunks with better interfaces might be the sol’n (and has been discussed)
      • cargo build plans, but they’re not there yet
    • key thing: version resolution, very much in cargo’s domain, would be good to specify
  • external dependencies FFI?

    • can we use FFI to talk to rust?
    • want module boundary between rust things, using ffi
    • today: build scripts in cargo exist, common thing is to build link to native libraries
      • one of the things that cargo raise does, you can describe the purpose of a build.rs (e.g., primarily to produce that 3rd party lib)
      • but you can translate that to a dep for that native library in your build system
  • summarize action items?

    • cramertj wants to know what
    • dtolnay is working on a potential design ideas for a successor to build.rs
      • cargo metadata description to specify what it is doing, maybe replace build.rs?
      • just listing inputs would be a huge improvement
        • yes but we want something that’seasierthan build.rs today, to incentivize it
    • caching, can we improve it
      • some of it may be low-hanging fruit, e.g. on mac. afile has timestamps
      • but part of it is the growing popularity of procedural macros (. soare uncachable by sccache)
        • if linker were more predictable, sccache could handle it, but it’s not
        • might be able to handle by separating out linking
  • how to translate cargo.toml etc?

    • buck today runs cargo, takes output with dep info rlib files
    • but new tool goal is to determine from cargo metadata
      • no way of “definitively connecting” resolved deps with unresolved deps
  • cargo vendor tends to be a bit overagressive

    • lots of things people want, seems to vary between groups
  • when developing procedural macros, could do better job of noticing token stream output hasn’t changed ..

    • incremental
    • sccache sometimes handles that well (e.g. w / build.rs)
  • related topic: distributed builds

    • sccache has support for that
      • but maybe sends whole dep folder, not always ok
      • would need more precise dep information to handle that (passing precise info fortransitivedependencies)
        • - externis precise, but transitive deps are still figured out by rustc
    • related: would be nice if, for rustc, could pass all the sources explicitly
      • in buck do you list all sources?
        • yes but a lot of globs:)
  • would be nice to have a tool that handled all the easy cases, with room for “extra” cases here and there

  • alex: interested in solving a lot of these issues and have thoughts

    • open to talking later about this stuff
    • a lot of small details, bug fixes, etc – long road, no silver bullet
  • some kind of “enterprise cargo” place to hold this discussion (s)

  • a lot of needs boil down to:

    • quick fix combined with longer re-architecture
  • Brave Browser

    Payeer

    Read More

    Leave a Reply