Yeah, but even high-powered hardware can take a "major" hit from a GC pause when your application is extremely latency sensitive.
IMO it would be great to get folks who write the enormous base of existing realtime apps driving critical devices everywhere to sit up and take notice of Rust.
EDIT: I mean to say that many of my colleagues who write realtime software dismiss new languages as including GC baggage by default (because so many do!). So, hey, good that the video calls this out.
> IMO it would be great to get folks who write the enormous base of existing realtime apps driving critical devices everywhere to sit up and take notice of Rust.
It would, and definitely should, move into the direction of safer languages than C.
The biggest problem I see is tooling and legacy. Tooling, because there's a ginormous amount of testing and design software that "works with C" (whatever that means in the context of the tool). Legacy, because everyone already has their 20 year old codebases and it's just not convenient to start focusing on two languages and switching the old code to Rust is just plain impossible economically.
Third problem is compiler. LLVM (rustc is a frontend to it, no?) is really good choice, but gcc has enormous advantage in supporting so many small platforms and it's very significant in this area of SW dev.
On the plus side, I've really gotten the impression that the Rust folks are truly trying to make adoption as smooth as possible. If that works, and Rust proves to be much better than C for these kinds of systems, I'd guess adopting Rust rather than not would start looking economically viable to companies. I mean, in the end that's what matters to them the most, and it's not easy to replace all you C ninjas with competent Rust writer.
Rather than a ground-up rewrite, I expect people will begin using Rust in the same way that Firefox has: identify individual components that would most benefit from Rust, segment those components off behind well-defined C interfaces, then write a compatible Rust lib using Rust's ability to expose C interfaces.
If you're considering switching to Rust for code/memory safety reasons, SaferCPlusPlus[1] may be an easier/cheaper/low risk option. It allows you to add memory safety to your existing code base in a completely incremental way, with no dependency risk. (At the moment, standard library support is required though.)
Are you confusing SaferCPlusPlus with a different library? SaferCPlusPlus is a new library that makes it practical to stick to a memory safe subset of C++ (i.e. no native pointers, no native arrays, no std::array<>, no std::vector<>, etc.).
Using the SaferCPlusPlus library to replace all uses of C++'s unsafe elements does result in code that is as memory safe as Rust, or any other modern language. The main shortcoming at the moment is that it doesn't yet provide memory safe replacements for all of the standard library's unsafe elements, just the most commonly used ones.
Create a vector. Push an element onto it. Take a reference to that element with operator[]. Clear the vector. Call a method on that dangling reference.
Create an object on the stack. Return a reference to that object. Call a method on that reference.
Create a vector. Push an element onto it. Call a method on that element that clears the vector and then calls another virtual method on itself, via the this pointer.
Accidentally share a vector between threads. Race push_back() and remove().
Etc. etc. We didn't implement lifetimes for no reason.
Additionally, the pointer registration mechanism that that library uses has a runtime performance cost worse than a GC write barrier (because it incurs writes on reads).
> Create a vector. Push an element onto it. Take a reference to that element with operator[]. Clear the vector. Call a method on that dangling reference.
> Create an object on the stack. Return a reference to that object. Call a method on that reference.
References are one of the unsafe C++ elements that SaferCPlusPlus is intended to be used to replace [1].
> Create a vector. Push an element onto it. Call a method on that element that clears the vector and then calls another virtual method on itself, via the this pointer.
Yes, that series of operations is safe. A related example from the "msetl_example.cpp" file:
typedef mse::mstd::vector<int> vint_type;
mse::mstd::vector<vint_type> vvi;
{
vint_type vi;
vi.push_back(5);
vvi.push_back(vi);
}
auto vi_it = vvi[0].begin();
vvi.clear();
try {
/* At this point, the vint_type object is cleared from vvi, but it has not been deallocated/destructed yet because it
"knows" that there is an iterator, namely vi_it, that is still referencing it. At the moment, std::shared_ptrs are being
used to achieve this. */
auto value = (*vi_it); /* So this is actually ok. vi_it still points to a valid item. */
assert(5 == value);
vint_type vi2;
vi_it = vi2.begin();
/* The vint_type object that vi_it was originally pointing to is now deallocated/destructed, because vi_it no longer
references it. */
}
catch (...) {
/* At present, no exception will be thrown. We're still debating whether it'd be better to throw an exception though. */
}
I agree with the gist though. This kind of thing should be prevented at compile time. Rust has an excellent static analyzer/enforcer built into its compiler. Arguably, it would be a service to the community to unbundle it from the Rust compiler and make it available for application to C++ code as well. Arguably.
> Accidentally share a vector between threads. Race push_back() and remove().
SaferCPlusPlus addresses the sharing of objects between asynchronous threads [2]. A particular shortcoming of C++ wrt to object sharing is that it doesn't have a notion of "deep const/immutability".
> Additionally, the pointer registration mechanism that that library uses has a runtime performance cost worse than a GC write barrier (because it incurs writes on reads).
Um, yeah, modern code should try to avoid the use of general pointers (and generally does). Most modern languages don't provide general pointers. SaferCPlusPlus makes them safe and slow (and available for easy porting of legacy code). When writing new code you would instead, when required, use one of the faster pointer types available in the library.
Don't interpret SaferCPlusPlus as an assertion that C++ is a uniformly better language than Rust or other modern languages. It's more of a suggestion that C++ and existing C++ code bases can be salvaged to a greater degree than one might think.
> References are one of the unsafe C++ elements that SaferCPlusPlus is intended to be used to replace [1].
OK, so you can't use references. Then, as I said before, your pointer replacements have a runtime performance cost worse than GC write barriers.
> Yes, that series of operations is safe. A related example from the "msetl_example.cpp" file:
I don't think you understood me. I mean the this pointer. "this" is hardwired into C++ to be an unsafe pointer.
> I agree with the gist though. This kind of thing should be prevented at compile time. Rust has an excellent static analyzer/enforcer built into its compiler. Arguably, it would be a service to the community to unbundle it from the Rust compiler and make it available for application to C++ code as well. Arguably.
Not possible. It's totally incompatible with existing C++ designs.
> Um, yeah, modern code should try to avoid the use of general pointers (and generally does). Most modern languages don't provide general pointers.
I think you're getting lost in the weeds of what a "general pointer" is and is not. It doesn't matter.
The point is that if your references track their owners at runtime, then you are just creating a GC. If the overhead of doing that is worse than a traditional GC (which, if you are doing that much bookkeeping, it will be), then there's little purpose to it.
> OK, so you can't use references. Then, as I said before, your pointer replacements have a runtime performance cost worse than GC write barriers.
The library provides three types of pointers - "registered", "scope" and "refcounting". I believe you are referring to the registered pointers, that indeed have significant cost on construction, destruction and assignment. But registered pointers are really mostly intended to ease the task of initially porting legacy code. New or updated code would instead use either "scope" pointers, which point to objects that have (execution) scope lifetime, or "refcounting" pointers. Scope pointers have zero extra runtime overhead, but are (at the moment) lacking the needed "static enforcer" to ensure that scope objects are indeed allocated on the stack. (Their type definition does prevent a lot of potential inadvertent misuse, but not all. And Ironclad C++ does have such a static enforcer.)
> I don't think you understood me. I mean the this pointer. "this" is hardwired into C++ to be an unsafe pointer.
You're right, that's a good point. But really it's a practical issue rather than a technical one. I mean technically, use of the "this" pointer should be replaced with a safer pointer, just like any other native pointer.
For example this is technically one of the safe ways to implement it in SaferCPlusPlus:
class CA { public:
template<class safe_this_pointer_type, class safe_vector_pointer_type>
void foo1(safe_this_pointer_type safe_this, safe_vector_pointer_type vec_ptr) {
vec_ptr->clear();
/* The next line will throw an exception (or whatever user specified behavior). */
safe_this->m_i += 1;
}
int m_i = 0;
}
void main() {
mse::TXScopeObj<mse::mstd::vector<CA>> vec1;
vec1.resize(1);
auto iter = vec1.begin();
iter->foo1(iter, &vec1);
}
That is, technically, if you're going to use the "this" pointer, explicitly or implicitly, you should pass a safe version of it (in this case "iter"). But yeah, in practice I don't expect people to be so diligent. I wonder how often this type of scenario arises in practice?
So do I understand correctly that the Rust language allows for the same type of code, but the compiler won't build it unless it can statically deduce that it is safe?
> Not possible. It's totally incompatible with existing C++ designs.
Even if you prohibit the unsafe elements? Including (implicit and explicit) "this" pointers?
Hmm, a more practical approach might be to mirror the GC languages and only permit (not-null) refcounting pointers as elements of dynamic containers such as vectors. Ensuring that all references don't outlive their targets, thereby eliminating the implicit "this" pointer issue. I think. Is that how Rust does it?
No, safe Rust only has safe references, and that includes "this" ("self" in Rust). Because the lifetimes are part of the type, it does not require the runtime overhead of reference counting.
Rusts references behave like plain raw C/C++ pointers at runtime, without any bookkeeping code running at all.
The magic all lies in the compiletime borrow checker, which roughly works like this:
- All data is accessed either through something on the stack or in static memory.
- Accessing data, say by creating a reference to it,
causes the compiler to "borrow" the value for the scope in which the reference
is alive.
- The references can be alive for any scope equal or smaller than for which
access to the data itself is valid.
- References track the original scope for which they are alive around as a
template-paramter-like thing called "lifetime parameter".
Note that Rusts use of the word "lifetime" is thus a bit narrower than the
one used in C++, since it just talks about stack scopes, and not the lifetime
of the actual value as would be tracked by a GC or ref counting.
Example:
let x = true;
let r = &x;
Here, r would infer to a type like `Reference<ScopeOfXVariable, bool>`.
(The actual type in rust would be a `&'a T` with
'a = scope of x, and T = bool).
- Because the scope is tracked as part of the reference type,
it is possible to copy/move/transform/wrap references safely, since
the compiler will always "know" about the original scope and thus can
check that you never end up in a situation where you accidentally outlive the
thing you borrowed, say if you try to return a type that contains a reference
somewhere deep down.
- The borrow itself acts as a compiletime read/write lock on the thing you referenced,
so for the scope that the reference is alive for the compiler prevents
you from changing or destroying the referenced thing. Example:
// This errors:
let mut a = 5;
let b = &a;
a = 10; // ERROR: a is borrowed
println!("{}", *b);
// This is fine:
let mut c = 100;
{
let d = &c;
println!("{}", *d);
}
c = 50;
- The above examples just use `&` for references, but Rust has two references types:
- &'a T, called "shared reference", which cause "shared borrows".
- &'a mut T, called "mutable references", which cause "mutable borrows".
- Both behave the same in principle, but have different restrictions and guarantees:
- A mutable borrow is exclusive, meaning no other other borrow to the same data
is allowed while the &mut T is alive, but allows you to freely change the T through
the reference.
- A shared borrow may alias, so you can have multiple &T pointing
to the same data at the same time, but you are not allowed to freely change T through
the reference.
- (If those two cases are too rigid there is also a escape hatch that
a specific type may opt-into to allow mutation of itself through a shared reference, with
exclusivity checked through some other mechanism like runtime borrow counting.)
- Through these two reference types, Rust libraries can abstract with arbitrary APIs
without loosing the borrow checker guarantees. Eg, the "reference to vector element"
example boils down as this:
let mut v = Vec::new();
v.push(1);
let r = &v[0]; // the reference in r now has a shared borrow on v.
v.push(2); // push tries to create a mutable borrow of v, which conflicts with the
borrow kept alive by r, so you get a borrow error at compiletime.
println!("{}", *r);
The important part is that all this is there, per default, for all Rust code in existence, so you can not accidentally ignore it like a library solution you might not know about, or like language features that don't know about the library solutions.
Correct me if I'm wrong, but it looks like this just provides some 'safe' alternatives to unsafe C++ things. It's still up to the diligence of the programmer to not use those things and nothing is getting statically verified.
By contrast, when I write Rust, memory safety (and type safety) are verified by the compiler.
That's right. SaferCPlusPlus is not complete and does not yet include a static verifier/checker.
Without a static verifier, memory safety is not guaranteed, just dramatically improved. And for many cases where there is a large investment in an existing code base, this might still be a more expedient solution. Even if only an interim one.
For example, I would estimate that, with concerted effort, it would take a matter of weeks to "port" the existing Firefox C++ code base to SaferCPlusPlus. Presumably this would dramatically reduce "remote execution", and other memory bugs while we wait for the Rust implementation.
In cases where guaranteed memory safety is desired, you might think of it this way: In Rust, the static checker is built into the compiler. In C++, static checkers/analyzers are separate tools. You could choose to require that your C++ code must be verified to be safe by a static analyzer of your choosing. In C++, it can be difficult/inconvenient to write non-trivial code that fully appeases the static analyzer, just like in Rust. You can use SaferCPlusPlus to make it easier to fully appease the static analyzer (like the Rust language does).
I should also mention "Ironclad C++". It's similar in function to SaferCPlusPlus, but it uses garbage collection (where SaferCPlusPlus does not). It does include a static verifier/enforcer.
As a fan of "memory safety without using GC", I'm rooting for Rust. But I think the idea of achieving memory safety in C++ can be too quickly dismissed.
> In C++, static checkers/analyzers are separate tools. You could choose to require that your C++ code must be verified to be safe by a static analyzer of your choosing.
The problem is that, in C++, there is no such static checker in existence (except ones with GC).
Well, like I said in the other comment, you guys could fix that by unbundling the static checker in the Rust compiler and making it applicable to (a subset of) C++ code as well :)
So then would you agree with the notion that (a practical subset of) C++ combined with a static analyzer could be just as safe and fast as Rust if, hypothetically, there existed an enthusiastic community comparable to Rust's? Or are there intrinsic technical issues? Or syntax issues?
Also, let me throw this notion at you: Rather than disallow code that can't be verified to be (memory) safe, the compiler could instead inject runtime checks that would be optimized out using the same analysis that the static checker uses.
That is, instead of requiring that the code be fast and safe or it won't compile, it becomes: If your code is not clearly, intrinsically safe then it will have runtime checks that will slow it down. And the compiler could list any runtime checks that it wasn't able to optimize out.
The reason I suggest this is that memory safety is just the enforcement of certain invariants. There's no reason why we couldn't let the programmer define additional, application specific invariants and have the build process treat them the same way it treats memory access invariants.
So for example, when a user defines a class, it could have a standard member function called "assert_object_invariants()" or something, that the programmer can define. Then anytime a (non-const?) member function is called, the compiler can insert runtime asserts at the beginning and end of the member function call. And again the compiler can tell you when those runtime asserts aren't optimized out. Wouldn't that make sense? I haven't really thought it through.
> Well, like I said in the other comment, you guys could fix that by unbundling the static checker in the Rust compiler and making it applicable to (a subset of) C++ code as well :)
The problem is that you still need extra annotations. Namely lifetime annotations (or something similar relating between borrows -- either that, or use a lot of elision which can be crippling). On top of that, the programming style Rust encourages is not the same as the ones you tend to see in C++ codebase, and programming in the C++ style will lead to code that doesn't compile.
> Rather than disallow code that can't be verified to be (memory) safe, the compiler could instead inject runtime checks that would be optimized out using the same analysis that the static checker uses.
This might be more tractable (and is an interesting idea). But that optimizer would be hard to write.
> So then would you agree with the notion that (a practical subset of) C++ combined with a static analyzer could be just as safe and fast as Rust
I think this is what the new ISOCPP core guidelines are trying to do? Though they don't go far enough in preventing memory unsafety IIRC (this may have changed).
> The problem is that you still need extra annotations. Namely lifetime annotations
Well, the idea is not to have the static analyzer verify typical C++ code. Just some practical subset. So for example I think it's quite practical to write C++ code that uses only "scope" pointers (basically pointers to objects on the stack) and (not-null) refcounting pointers, that intrinsically don't outlive their targets. Lifetimes would be implied by the types. So wait, what more does Rust's static analyzer give us again? Does it somehow remove the need for refcounting heap objects?
> the programming style Rust encourages is not the same as the ones you tend to see in C++ codebase, and programming in the C++ style will lead to code that doesn't compile.
I have no problem with that. I have no attachment to the "traditional" C++ programming style.
> This might be more tractable (and is an interesting idea). But that optimizer would be hard to write.
Why? The static analyzer has an opinion on whether or not a program is safe. The optimizer just wants to know if it still thinks it's safe when you remove a runtime check.
> I think this is what the new ISOCPP core guidelines are trying to do? Though they don't go far enough in preventing memory unsafety IIRC (this may have changed).
The ISOCPP core guidelines approach is to recommend the use of C++'s intrinsically dangerous elements in a way that is "usually safe", but not always, and rely on their static analyzer to catch bugs. So the question becomes, what do you do in the many cases where the static analyzer doesn't know if it's safe or not. You can try to redesign your code so the static analyzer can understand that it's safe. But that's often very inconvenient or has a performance cost. Often the most practical (safe) solution is to resort to something like SaferCPlusPlus.
> So wait, what more does Rust's static analyzer give us again? Does it somehow remove the need for refcounting heap objects?
Refcounting is rarely needed because most sharing is done via "borrows", which usually work via scope-tied "references" which may point to either the stack or the heap.
Implementing and enforcing local scope pointers in C++ via static analysis is not hard. Making it possible to thread borrows through APIs and annotate things with the borrowing semantics (which is what makes Rust avoid refcounting or even allocation costs) requires a bit more work.
> I have no attachment to the "traditional" C++ programming style.
Right, but at this point you have a very weird looking subset of C++ that can't seamlessly integrate with other libraries, and can't be translated to from regular C++ without significant human intervention -- why not just use Rust?
> Why? The static analyzer has an opinion on whether or not a program is safe. The optimizer just wants to know if it still thinks it's safe when you remove a runtime check.
I guess I misunderstood your proposal. This sounds doable. But, again, you'd be using a weird subset of C++ that doesn't seamlessly integrate, and you're just better off using Rust at this point.
Instead of trying to port Rust's guarantees to C++ it makes more sense to use the same principles to organically build on top of C++, in a different way. IMO this is sort of what ISOCPP is trying to do, but they're not quite there yet, and trying to find a compromise between making the language too different and making it safe is hard.
> So the question becomes, what do you do in the many cases where the static analyzer doesn't know if it's safe or not. You can try to redesign your code so the static analyzer can understand that it's safe.
This is always going to be a problem regardless of the static analyzer. You have to design it to reject these cases. Rust does this too; there are some edge cases where you need to design around the borrow checker (though usually this doesn't incur additional cost, and the most common of these are going to be addressed). If designing low level abstractions like vectors and stuff (or doing FFI), Rust gives you an escape hatch ("unsafe"), which has a couple of checks disabled and can be used to write the code you need (verifying safety of a program then just requires verifying that these blocks of code are sound and do not rely on any invariants that can be broken by code outside of them).
> > Why? The static analyzer has an opinion on whether or not a program is safe. The optimizer just wants to know if it still thinks it's safe when you remove a runtime check.
> I guess I misunderstood your proposal. This sounds doable. But, again, you'd be using a weird subset of C++ that doesn't seamlessly integrate, and you're just better off using Rust at this point.
My proposal is sort of language independent. I'm just suggesting a better way to address the code safety/correctness issue might be with runtime asserts, because it's more general. Some of the runtime asserts (like the ones regarding memory safety) will be automatically generated by the compiler, and others would be user defined (but compiler placed). And the static analyzer (I guess "the borrow checker" in Rust) would be repurposed to strip out the unnecessary runtime checks. And the compiler/optimizer would tell you which runtime asserts it was unable to optimize out. (Presumably good Rust code would result in all the memory runtime asserts being optimized out.)
This allows for programs that are not just memory safe, but "application invariant" safe as well. Right? I mean it's not really a totally new concept, I guess it's kind of "design by contract" or whatever, but with a slight performance bent because the optimizer tells you what runtime checks it's having trouble getting rid of. And maybe there would be a way to indicate that you expect the optimizer to be able to get rid of certain runtime checks, and instruct it to generate a warning (or error) if it doesn't. I'm just sayin'...
I don't think it works. All of the "runtime asserts" require bookkeeping. That bookkeeping ends up being worse in terms of performance than what you have with a GC.
> Right, but at this point you have a very weird looking subset of C++
It's a little weird looking at first glance, but ultimately it's not really that weird. The main unfamiliar thing is that objects that are going to be the target of a (safe) pointer need to be declared as such. So
{
std::string s1;
auto s1_ptr = &s1;
}
becomes
{
mse::TXScopeObj<std::string> s2;
auto s2_ptr = &s2;
}
s2 acts just like a regular string. It's just wrapped in a (transparent) type that overloads the & (address of) operator so that s2_ptr is a safe pointer. (For example, in this case s2_ptr cannot be retargeted or set to null).
> that can't seamlessly integrate with other libraries,
Sure it can, that's the point. For example:
{
std::string s1 = "abc";
mse::TXScopeObj<std::string> s2 = "def";
auto s2_ptr = &s2;
std::string s3 = s1 + s2; // s2 totally works where an std::string is expected
s3 += *s2_ptr;
*s2_ptr = s1; // and vice versa
}
> and can't be translated to from regular C++ without significant human intervention --
Umm, it could be automated, but you would need a tool that can recognize object declarations. But modern C++ code is mostly safe already. I mean you're supposed to try to avoid pointers in favor of standard containers and iterators. So just replace your "std::vector"s with "mse::mstd::vector"s and your "std::array"s with "mse::mstd::array"s and you're mostly there.
> why not just use Rust?
My impression is that Rust has been evolving a lot. Is the language stable now? Is it time to jump in? Has it vanquished D as the successor to C++? Are we happy with Rust's solution for exceptions?
Even if Rust is the future, and the future is here, I'm still stuck with existing C++ projects. And I'd feel better if they were (at least mostly) memory safe. There must be others in the same boat.
> It's a little weird looking at first glance, but ultimately it's not really that weird.
Readability is important for maintainable code. And safe coding patterns tend to involve a lot of sum types (which you can model in C++ with the visitor pattern, but it's significant overhead in code length and possibly even at runtime), and a fair amount of generics (which are cumbersome in C++, and the error reporting is awful). If you're not going to get the existing tool/library infrastructure either way, so you're just evaluating on their merits as languages, I don't think you'd ever want to pick C++ over Rust.
> modern C++ code is mostly safe already.
I've been hearing that for about a decade now (and I suspect the only reason it isn't longer is that I wasn't programming before then). And yet we still see bugs, all the time. Not subtle bugs, but stupid, obvious bugs.
> Is the language stable now?
Yes, as of 1.0.
> Is it time to jump in? Has it vanquished D as the successor to C++? Are we happy with Rust's solution for exceptions?
Yes.
> Even if Rust is the future, and the future is here, I'm still stuck with existing C++ projects. And I'd feel better if they were (at least mostly) memory safe.
My belief is that no amount of whack-a-mole is going to make those projects memory-safe, and none of the linters/checkers/dialects is ever going to reach a point where it offers actual guarantees. If it were possible it would have happened by now. The only way you're going to get to memory safety is by rewriting those projects, bottom to top (which is probably what you'd have to do to use one of these C++ dialects anyway). If you want to do the migration gradually (and you should!) rust has pretty good interop.
> The main unfamiliar thing is that objects that are going to be the target of a (safe) pointer need to be declared as such.
Your proposal was to take Rust's static analysis and make it work with C++. It's clear you don't know Rust. Why are you so confident about what kind of effect that would make on the language? Rust is not "like C++ but with more static analysis", it's a very different language. A lot of the safety that modern C++ gets you is something that Rust gets you, using different mechanisms.
> Sure it can, that's the point. For example:
This example seems to be a SaferCPlusPlus example? I'm talking specifically about your proposal to take Rust's static analysis and use it on C++. That isn't what SaferCPlusPlus seems to be doing. It seems like you might be talking about something else? The general applicability of safety based static analysis? I'm not arguing with that.
> My impression is that Rust has been evolving a lot. Is the language stable now?
Still evolving, just like C++ is, but is stable now. Has been for more than a year.
> Are we happy with Rust's solution for exceptions?
I am. Most folks in the Rust community are. There are no missing pieces now, though.
> Has it vanquished D as the successor to C++?
No, and that's subjective, and your C++-with-Rusts-static-analysis will not be in a different boat.
> I'm still stuck with existing C++ projects. And I'd feel better if they were (at least mostly) memory safe.
That's my point. The amount of work to convert existing C++ code to something that satisfies a static analyzer using Rust's exact set of invariants is just as much as the work required to convert to Rust. You won't be able to just throw a new static analyser at C++ code and stuff will magically work. It will require significant refactoring and effort. Nor will your code be able to easily talk with other C++ libraries.
> Umm, it could be automated
No, "human intervention" I said. It can't be automated easily, because the style it enforces is significantly different. I've done quite a bit of jumping back and forth between C++ and Rust these days (in the same codebase, with FFI), and the fact that the structure and style of programs is different is very apparent.
There is work on translating C to Rust (and might grow to C++ some day?), but IIRC you still need significant human intervention. For C at least there is no existing safety system to replace, so it's still easier, but translating from C++s (largely incompatible) existing safety system will be tough.
Translating code will need the translator to figure out what the code is trying to do, basically. This isn't like Python2->Python3. Like I said, the style enforced is different. I don't mean syntax style, I mean how code is structured at a higher level.
> I mean you're supposed to try to avoid pointers in favor of standard containers and iterators
If you want to be 100% safe you need to solve iterator invalidation and Rust's solution is something that is very hard to make work with C++s usual style of coding. If you want to avoid all unnecessary allocations and refcounting you need a lifetime system. To use Rust's model the mechanism of moving would have to be tweaked considerably.
Again, these problems can probably be solved organically from C++ itself (which I guess is what SaferCPlusPlus is doing?), building a static analyser that tries to solve them building on the existing mechanisms in C++. But importing Rust's analysis will just get you a completely new language which has almost no use.
Oh yeah, didn't mean to give the impression otherwise. But I think I've gained some understanding since yesterday. I'm just learning, but tell me if this I'm getting this at all:
- Rust only considers scope lifetimes (and "static" lifetime which is basically like the uber scope)?
- References can only target objects with a superset (scope) lifetime.
- You can only use one non-const reference to an object per scope. This solves the aliasing issue?
> This example seems to be a SaferCPlusPlus example? I'm talking specifically about your proposal to take Rust's static analysis and use it on C++.
Sorry, I misunderstood. I thought you'd switched context. Let me try again:
There are a couple of reasons for pursuing "Rustesque" programming in C++ as opposed to in Rust itself. First let me point out that there would have to be a mechanism for distinguishing between "statically enforced" safe blocks of C++ code and the rest of the code (just like Rust's "unsafe" blocks I guess).
So then the obvious advantage is a better interface to C++ code and libraries. Rust only supports plain C (FFI) interfaces? Is that right?
But another argument is that there multiple strategies to achieve memory safety (and code safety in general). The two popular ones are the Rust strategy and the GC strategy. One is not uniformly superior to the other. Superior maybe, but not uniformly so. Presumably the Rust strategy will be more memory efficient, and maybe theoretically faster, whereas the GC strategy might facilitate higher productivity.
If you choose Rust, you're committed to one strategy. Now, I don't know if it'll turn out to be realistic, but I'm wondering if it's possible that C++ can support both strategies. (And maybe some other ones too.) Not just different strategies in different applications, but even in the same application. The Rust static analyzer would of course only work on indicated blocks of code.
Of course writing code in one strategy or another would be more clunky in C++ than a language specifically designed for it, but everything's a trade-off. The question is, is it worth it?
It's easy to say the clunkiness isn't worth it, but Rust probably has the weakest argument in that respect. Right? (I mean doesn't Rust have a reputation of being clunky anyway?)
Again, I barely know any Rust, but it seems to me that the main safety functionality that Rust provides over, say, SaferCPlusPlus, is the static enforcement of "one non-const reference to an object per scope" as an efficient, but restrictive, solution to the aliasing issue.
Hmm, obviously I have to find some time to learn Rust better, but intuitively, it seems like the simple Rust examples I've seen so far would have a corresponding C++ implementation, and it's not immediately obvious to me why a static analyzer couldn't work on the corresponding C++ code. Is there a simple example that demonstrates the problem? Am I just underestimating the difficulty of static analysis?
> You can only use one non-const reference to an object per scope. This solves the aliasing issue?
More accurately, if you have a mutable reference you cannot have any other references.
> Rust only supports plain C (FFI) interfaces? Is that right?
Yes, but with bindgen you have a decent C++ interface.
My contention is that the "better interface" is only slightly better, and probably not enough to justify basically creating a whole new language. Note that for your safe RustyCPP code, the regular-C++ code will be completely unsafe to use and you'll have to write some safety wrappers that encode in the guarantees you need. I've been doing this in the Rust integration in Firefox, and I'm sure that a dialect of C++ that uses Rust's rules will need to do something similar. That's where the bulk of the integration cost comes from.
> If you choose Rust, you're committed to one strategy
I mean, you can just blindly use Rc<T> or Gc<T> in Rust (Gc<T> only exists as a POC right now but we plan to get a good one up some day).
But yeah, magical pervasive GC would be hard to do in Rust.
> The question is, is it worth it?
You're arguing between choosing Rust vs CPP-with-static-analysis. I'm arguing between choosing Rust vs CPP-with-Rust-esque-static-analysis. I think the latter strongly points towards Rust, but the former has interesting tradeoffs.
> I mean doesn't Rust have a reputation of being clunky anyway?
Not ... really? It has a reputation for having a steep initial learning curve.
> it seems like the simple Rust examples I've seen so far would have a corresponding C++ implementation
Oh, this would work. But the reverse -- taking C++ code and making it work under the Rust rules -- is very hard. Not because of the aliasing rules, but because of how copy/move constructors are used in C++ (Rust's model strongly depends on initialization being necessary), the whole duck-typed-templates thing in C++, and similar things with respect to coding patterns that don't translate well.
Again, you could build a safety system on C++ that respects these patterns, but it would not be the same as taking Rust's rules and enforcing them on C++.
> Well, like I said in the other comment, you guys could fix that by unbundling the static checker in the Rust compiler and making it applicable to (a subset of) C++ code as well :)
No, we can't do that. It is incompatible with C++.
> Rather than disallow code that can't be verified to be (memory) safe, the compiler could instead inject runtime checks that would be optimized out using the same analysis that the static checker uses.
That is not possible. It would require massive bookkeeping, much like your library does. That would eliminate most of the benefits of Rust.
Sometimes I think Rust people lose the forest for the trees. The end goal isn't for the compiler to verify the safety, the end goal is for the software itself to be safe in a way that's cheaper.
It doesn't really matter if they both end up at the same place, which is safe software.
> Sometimes I think Rust people lose the forest for the trees. The end goal isn't for the compiler to verify the safety, the end goal is for the software itself to be safe in a way that's cheaper.
I don't care if the software is verified via libraries or compilers. The problem is that C++ verifiers don't work.
> It doesn't really matter if they both end up at the same place, which is safe software.
I think the contention is that, unless you're applying NASA style rigor, you don't end up in the same place without verifying the safety automatically, because in practice it's too expensive to verify the safety manually (without getting squeezed out of the space by your competitors.)
SaferCPlusPlus's goals are noble, but approaching the problem with a library-only solution is problematic. None of the huge swaths of legacy and third party code I'd like to sanitize uses it - and a large scale rewrite to 'fix' that may very well introduce more bugs than it fixes. A library cannot 'fix' fundamental language constructs either, short of telling you to please remember to perfectly avoid those language constructs even if you're very very used to them. Frankly, I'm skeptical of how useful I'd find SaferCPlusPlus even for new projects - especially when modern SC++L implementations already have a lot of error checking code built into them as well, at least for debug builds.
I'm interested in Rust because it takes the same approach to securing code from bugs as seems to help a lot when I apply it to C++: Static analysis and annotations, designs to make edge cases impossible to ignore, and where static analysis cannot perfectly find all problems, let it error out reliably at runtime instead of randomly corrupting memory unless I really really really mean it.
> I think the contention is that, unless you're applying NASA style rigor, you don't end up in the same place without verifying the safety automatically, because in practice it's too expensive to verify the safety manually (without getting squeezed out of the space by your competitors.)
That's a claim that has yet to be shown to be true. Maybe it is true, and maybe it isn't, but C++ compilers tend to give pretty good warnings that you can treat as errors, and coupled with good external tools it isn't clear that rust is significantly safer than C++.
The scary part of it all is how many rust users seem to think that it is a given when even the rust standard vec container has unsafe code in it.
I personally think that if rust is shown to statistically decrease the security/error rate on large projects, it's going to be with the use of 3rd party tools, not the specific semantics of the language. I'm of the opinion that the beauty of the unsafe block isn't in any inherent "safety", as much as it is giving more semantics for 3rd party tools to analyze.
> C++ compilers tend to give pretty good warnings that you can treat as errors
They miss far too many simple cases for this to possibly be a sensible claim, e.g. neither gcc -Wall nor clang -Weverything warn about the two massive problems in the following code:
#include<vector>
int &foo() {
std::vector<int> v{ 0, 1, 2, 3 };
int &x = v[0];
v.clear();
int y = x; // dereferencing dangling pointer!
(void)y;
return x; // escaping a dangling pointer!
}
Rust is clearly a step up since it does actually catch these. The Rust compiler is the "third party" tool that helps get better code, unlike C and C++, the static analysis is built-in.
This is running the static analysis pass instead of the normal compile pass, but stuff is improving. Of course you're preaching to the choir as far as I'm concerned - this stuff is way late to the party, and speaking generally, has issues with false positives and failing to detect things.
I'll have to check when I get home, but I'm fairly certain you're purposefully suppressing the compilers warnings here. That's not a good way to make your argument about the compiler not being able to warn you about problems.
Yes, I'm purposely suppressing the unused variable warning with the (void)y;, because presumably real code will actually do something with the value: I could've printed y or left out that line, whatever, the compilers still don't warn about the actual major problems.
Your argument as to why the compiler won't warn you about problems is to show an example where you purposefully supress the warnings the compiler gives you.
I honestly don't think we should continue this conversation.
- not suppressing a warning about a memory safety problem
- not effecting the lack of warnings for the memory safety problems: remove the `(void)y;` line and there's still no warnings about the dangling pointers.
Seriously, you are focusing on something irrelevant. Either pretend I didn't write that line, or pretend it was std::cout << y << std::endl;. The fundamental fact remains that the compilers do not warn about the major problem of handling dangling pointers, despite both of these being fairly trivial cases, just a tiny step up from pure stack allocation.
Yes, C++ compilers do have some warnings for some things, but the interesting warnings for this topic are insidious memory safety bugs like dangling references, not the basic unused variable ones. Rust warns about both, C++ compilers catch only the second one: the code I wrote is wrong for two reasons, and neither of those reasons is the unused variable.
If you're going to tout the quality of C++ compiler's warnings, they better flag as many cases of problems like use after free (and use after move), dangling references and iterator invalidation as they can, but I've never had a C++ compiler warn about any of these (other than the most basic case of returning a reference to a local variable).
> The fundamental fact remains that the compilers do not warn about the major problem of handling dangling pointers
I'm going to quote myself, emphasis mine.
"The end goal isn't for the compiler to verify the safety, the end goal is for the software itself to be safe in a way that's cheaper."
[snip]
"C++ compilers tend to give pretty good warnings that you can treat as errors, and __coupled with good external tools__ it isn't clear that rust is significantly safer than C++."
That's not the point. Array types in Ruby and Python are implemented in C. No one goes around saying those languages are actually no more memory safe than C++ (or maybe you do?).
> No one goes around saying those languages are actually no more memory safe than C++ (or maybe you do?).
It's unfortunate that you've chosen to try and make the scope smaller by referring specifically to "memory safety".
As a result, this will be my last response to you, I just don't have the energy to go back and forth with someone who isn't willing to be honest in this discussion.
But to answer your question, those languages are no safer than C++. I can write a C plugin in both that contains memory leaks and various safety issues. And in fact, both projects have had their own security problems.
> But to answer your question, those languages are no safer than C++. I can write a C plugin in both that contains memory leaks and various safety issues. And in fact, both projects have had their own security problems.
This definition makes any comparison of the safety of different languages totally useless: according to it, all languages are equally unsafe. You're free to want to use that definition, but it's a tautology and thus doesn't actually allow distinguishing between anything nor serve any purpose.
It's true that all languages offer escape hatches, but it's also true that there's a major qualitative (at least) difference between the constrained rarely used escape hatches of Python, Java and Rust, and the "the whole language is an escape hatch" approach of C++ and C.
In mathematics and the verification of programs, proofs will build from small proofs: first show that a function `foo` has a certain behaviour and then use this to show that `bar` (which calls `foo`) has another behaviour, etc etc, until the whole program is proved correct. Languages like Python, Java and Rust are designed with this in mind: prove the unsafe code correct and the language guarantees the rest of the code is memory safe. C and C++ have no such properly: a proof of memory safety requires touching every single line of code, not just the small number that actually need to escape down a level.
> It's true that all languages offer escape hatches
And that all languages experience safety issues as a result of these escape hatches, and that all languages suffer security issues despite sequestering these escape hatches.
Which goes back to what I said before.
"That's a claim that has yet to be shown to be true. Maybe it is true, and maybe it isn't ..."
[snip]
"I personally think that if rust is shown to statistically decrease the security/error rate on large projects, it's going to be with the use of 3rd party tools, not the specific semantics of the language. I'm of the opinion that the beauty of the unsafe block isn't in any inherent "safety", as much as it is giving more semantics for 3rd party tools to analyze."
> In mathematics and the verification of programs, proofs will build from small proofs: first show that a function [snip]
This is a non-sequitur. You're trying to compare a deductive proof in a formal logic system whose only requirement is to be internally consistent with messy reality. Look at the difference in approach. I said we won't know if until we have enough experience and data to analyze to see if there's a significant statistical difference between the error rates of software written in C++ vs Rust. You basically said we already know because we can write small programs that are safe, therefore we can write large programs that are safe. It's a non-sequitur.
> a proof of memory safety requires touching every single line of code, not just the small number that actually need to escape down a level.
And the same can be said of Rust, the unsafe blocks give a false sense of security. No one really cares if it crashed in an unsafe block if the root cause is from state manipulated in safe code somewhere away from the unsafe block. It takes a lot of discipline and scrutiny to make sure you don't accidentally put the state into a spot where the unsafe block can do bad things. This is the same sort of discipline required in C++.
That's the point you're not getting, and it's why I think 3rd party tools that can tell us more about the code being affected by the unsafe block is going to be more useful in the long run. Imagine a tool that gets run on checkin, or at specific intervals that can identify immediately that there are code changes that manipulate state that an unsafe code block depends on? It means developers can then examine the changes to make sure nothing bad happens.
Or you're in an IDE that changes the variable color to indicate that what you're working with affects an unsafe block, so you can be sure that you need to pay extra careful attention and definitely get a code review.
These same techniques work succesfully in C++. People deal with it in the exact same manner, they put it behind an interface and use code reviews and external tools to identify potentially dangerous things that human beings then step in and examine much more closely.
The point is, there is nothing inherent in rust that definitely makes it safer C++. There are potentially aspects of it that enable better tooling that could eventually make it safer than C++, but it will take time and careful analysis before it's obvious that it's safer.
Modern C++ tends to sequester these things off the way Rust would.
> It's unfortunate that you've chosen to try and make the scope smaller by referring specifically to "memory safety".
Okay, back to the broader scope - what's an area that you think Rust might do worse than C++ at? I'd be very interested in fixing any blind spots I might have.
Reading an array out of bounds is definitely unlikely to be correct/be a security vulnerability. Memory safety is absolutely a prerequisite for any other sort of safety one might want.
We agree on that, my point is that C++ does it via libraries, Rust does it by hiding unsafe blocks behind interfaces (aka libraries).
Time will tell which approach is ultimately superior (if either one of them is actually better), but until the it isn't clear that the Rust approach is statistically better than the C++ approach.
Ultimately the advantage Rust has is the ability to possibly provide better 3rd party tooling that will enable developers to make the right decisions more often than C++ does. Consider a tool that runs on code checkin that spits out a report of all sites where code that manipulates state that could affect an unsafe block was changed/written so that developers could then have a very focused peer review of the code to ensure the safe code doesn't put the state in such a spot that it causes problems.
I think in this way Rust may eventually be shown to be better than C++, but then again, maybe not.
> None of the huge swaths of legacy and third party code I'd like to sanitize uses it - and a large scale rewrite to 'fix' that may very well introduce more bugs than it fixes.
SaferCPlusPlus is designed for compatible interaction with unsafe legacy code and library interfaces. Some may see this as flaw. But it allows you to incrementally "improve" C++ code without requiring a total rewrite. It also means that members of a team can adopt it unilaterally. It's regular C++ code that won't interfere or impose on your co-programmers, even when you're working on the same code.
> A library cannot 'fix' fundamental language constructs either, short of telling you to please remember to perfectly avoid those language constructs even if you're very very used to them.
Right, but the "safe replacement" elements in the library are designed to behave just like their unsafe counterparts, perhaps making the transition easier. In terms of enforcement, I think it may be a "use it and they will build it" scenario. Once there is significant adoption of the SaferCPlusPlus library, it should take a relatively modest effort to implement a static enforcer. I mean, you just want to flag any uses of unsafe elements, not even do any analysis on them.
> Frankly, I'm skeptical of how useful I'd find SaferCPlusPlus even for new projects - especially when modern SC++L implementations already have a lot of error checking code built into them as well, at least for debug builds.
That's the beauty of SaferCPlusPlus. Let's say you're using std::vector<> somewhere in your program. You can just replace "std::vector<>" with "mse::mstd::vector<>" and now your vector is (optionally) safe. With a compiler directive you can choose to "disable" the safety features in any build (i.e. mse::mstd::vector<> will be automatically aliased back to std::vector<>). Compilers generally just do bounds checking (the "sanitizers" notwithstanding). SaferCPlusPlus checks for things like "use-after-free" as well.
And you don't need to link to any library. You just need to add a couple of header files to your project.
The sanitizers are fantastic. But they're not quite a substitute for SaferCPlusPlus [1]. SaferCPlusPlus addresses the issue of safely accessing objects from asynchronous threads.
> Static analysis and annotations, designs to make edge cases impossible to ignore, and where static analysis cannot perfectly find all problems, let it error out reliably at runtime instead of randomly corrupting memory unless I really really really mean it.
SaferCPlusPlus is not a competitor to, or an excuse to neglect static analysis. SaferCPlusPlus exists because static analysis does not fully solve the problem.
Sorry, I misread "ThreadSafetyAnalysis" as ThreadSanitizer [1]. Like I said, static analyzers are great. Some may feel that they sufficiently address the code safety issue in practice, some may not.
You're 100% correct that the end goal is safe software in a way that's practical to achieve. However, having the computer check one's code is generally regarded as a great way (even the best way) to do this: NASA's JPL doesn't accidentally recommend[0] turning on all compiler warnings and using static analysis tools, and it seems a little unlikely that most major tech companies would be spending millions on static analysers and statically-typed languages if they didn't think it helped them write correct code.
Well... how do you know the code is safe otherwise? Exhaustive tesitng is unreasonable. How do you ensure that you have achieved the end goal of safe software?
> because everyone already has their 20 year old [C] codebases
It's comments like these that remind me how exclusionary the software world is. Your definition of "everybody" is such a tiny number of people. But that's who you have in mind when you are constructing the world around you each day.
Well, one might notice "everybody" has to be a rhetoric exaggeration, since there's absolutely no way even remotely close to "everybody" would have any codebase at all whatsoever, not to talk about 20 year old legacy. Right?
Second, I was responding to a comment talking about "existing, critical, real time applications". Of which a huge number of cases do have existing, very old legacy codebases.
Third, I fail to see what you tried to bring to the conversation. If your only problem was with my rhetoric, see above.
Rust has libcore, which is, in many ways, more featureful than libc despite also having zero dependencies. From my perspective, the main advantage of C is the way in which chip manufacturers only provide (poorly supported/bug-ridden) C compilers, but this is likely to become less important as ARM takes over more and more of the world: it is only getting easier and cheaper to throw a full ARM chip into a device, due to economies of scale.
There are engineers with 20 years of C programming experience that will still make security errors while handling basic strings. "Small" does not mean "good" and "learning" a language doesn't mean you'll write good code with it.
No, small is good. But C isn't small. It's actually massive, and not terribly orthogonal. It's peppered with special cases, and things people think but aren't actually true (how would you check for an integer overflow in C?).
It's like comparing x86 to, say, m68k (or most things, really). One was designed. The other is an ungainly mess of hacks on top of hacks, which has a good, elegant design in there somewhere, desparately trying to get out. Guess which one is x86.
Now guess which one is C.
Worse really is better. Or at least, good enough.
C isn't a complete mess, and you can write good code in it if you're very careful, but it's not great.
It's starting to go that way but that's a very hard space to push new things into. I talk with all my ex-gamedev contacts and they're hesitant to even use lambda or other C++11 features that have been around for a while now.
I think Mozilla's plan of driving things forward with Servo and using that as an large-scale example of the gains that can be made is a good approach.
> IMO it would be great to get folks who write the enormous base of existing realtime apps driving critical devices everywhere to sit up and take notice of Rust.
Rust cannot make any latency guarantees either. Reference counting and its lifetimes also have pathological cases, ie. worst-case, an object can reference the entire heap which will take time proportional to the number of dead objects to free.
Copying collection in this case takes literally zero time, but it's pathological case is when all referenced data survives the current GC cycle, ie. proportional to live objects.
> Rust cannot make any latency guarantees either. Reference counting and its lifetimes also have pathological cases, ie. worst-case, an object can reference the entire heap which will take time proportional to the number of dead objects to free.
Rust doesn't use reference counting by default. Refcounting is very rare in Rust, much more rare than it is in C++. Most large C++ codebases I've worked with have thrown in the towel and started refcounting all the things. In Servo, for example most of the refcounting is across threads (where you basically have no other option), and a few interesting cases in the DOM, each with very good reasons for using refcounting.
Lifetimes are a concept at compile time and don't exist at runtime.
Edit: Oh, I see what you're talking about. A sufficiently large owned tree/graph in Rust will introduce latency. It's predictable latency though. I can make the same argument about for loops.
Unpredictably sized large trees in Rust are again pretty rare in general.
Trees don't have to be refcounted in Rust. Single-ownership trees are possible.
As long as they don't have backpointers. Backpointers are a problem under single ownership.
Right, I never said that trees have to be refcounted. A sufficiently large ownership tree will get deallocated all at once, which is the kind of latency the GP is talking about.
Linked data structures in Rust get complicated, though. See the "Too many lists" book.[1] Doubly linked lists, or trees with backlinks, are especially difficult. Either you have to use refcounts, or the forward pointer and backward pointer need to be updated as an unsafe unit operation. There might be an elegant way to do this with swapping, but I'm not sure yet.
Right, so you implement them with unsafe. While you can implement doubly linked lists safely with refcounting, you're perfectly free to implement them with unsafe code. This is what unsafe code is for, designing low level abstractions with clean API boundaries.
If you need unsafe code for basic operations within the language, something is wrong with the language. This isn't about talking to hardware, or an external library. It's pure Rust code.
(Some pointer manipulations can be built from swap as a basic operation. That may work for doubly-linked lists. The other big
problem is partially valid arrays, such as vectors with extra space
reserved. There's no way to talk about that concept within the language. There could be, but this isn't the place to discuss it.)
> need unsafe code for basic operations within the language
Building custom back-referencing data structures is not a "basic operation" anywhere outside programming classes. Adding significant complexity to rust Rust to make a 2% case marginally safer would be make the language worse. As long as the vast majority of code is not unsafe, then it achieves its goal.
I have been writing Rust code for almost three years now.
I have helped design a low level data structure exactly twice. In both cases, this was a highly custom concurrent data structure, which would have been even harder to get right in C++ or some other language.
If you need a regular run-of-the-mill datastructure it will exist in the stdlib or crates ecosystem. This is not a "basic task". Just because schools teach it early does not make it a "basic task". It's a task that needs to be done at some point, but doing it once and making it part of the stdlib or a crate is all that is necessary. It has become a "basic task" in C++ because it's easy enough to do that you don't need to reach for the stdlib, but that doesn't mean that it's necessary to have a bespoke implementation of a DLL that often in C++; usually the stdlib one will do.
The same "too many lists" book you linked to explains why DLLs are niche datastructures on the first page (singly linked lists can be implemented safely in Rust, though they can be somewhat niche too).
You will always have these pathological cases when you choose to use higher level memory management like simple reference counting or garbage collection no matter what language you use, whether it's Rust or assembler. The point of Rust is that you have complete control over what you use and pay for. If your concern is the overhead of lifetimes then you need to evaluate if you can afford heap allocation in the first place. Otherwise you can make de/allocation explicit in Rust just like in C, without losing the benefits of ownership checking.
Embedded hardware and software can only provide realtime guarantees because they are simpler, without complex pipelines, caches, branch predictors, or thread schedulers. If you want low latency embedded software you have to document the pathological cases, test whether they happen in real world use, and profile the code with each microarchitecture you're targetting anyway, let alone every product family. What language you use doesnt change that.
> You will always have these pathological cases when you choose to use higher level memory management like simple reference counting or garbage collection no matter what language you use
Not true, soft and hard realtime garbage collectors exist. Your runtime simply needs to bound the amount of reclamation work done at any given time.
For instance, the cascading free behaviour Rust is currently susceptible to can be broken up into a bounded series of free operations interleaved with ordinary program execution. Rust would then be realtime without truly changing its observable behaviour, except its timing in some programs.
> For instance, the cascading free behaviour Rust is currently susceptible to can be broken up into a bounded series of free operations interleaved with ordinary program execution.
You can probably make this work by plugging in a different allocator, if jemalloc doesn't do this already. The ability to batch up frees and mallocs isn't tied to GCs.
This won't reduce the perf impact of running a large tree of `Drop` impls, but it will reduce the free calls.
> You can probably make this work by plugging in a different allocator, if jemalloc doesn't do this already. The ability to batch up frees and mallocs isn't tied to GCs.
That gets tricky, because Rust people no doubt expect deterministic destruction on scope exit. But yes, my ultimate point is that low latency is a property of a runtime, not a language. C/C++ or Rust aren't going to automatically give you bounded latency, and adding tracing GC doesn't automatically take it away.
But this expectation is transitive. If you have an array of file handles, if you defer deallocating some of them but destruct them all upfront, you still have the latency issue we've been discussing. And if you defer destructing too, then you still have non-deterministic destruction and deallocation. I'm not sure there's a way around this tradeoff.
>> Not true, soft and hard realtime garbage collectors exist. Your runtime simply needs to bound the amount of reclamation work done at any given time.
That doesn't change anything! You're just choosing a garbage collector with a default deterministic pathological case, which is a guarantee you can make about almost any GC by carefully tailoring your memory usage to your scenario and choice of algorithm. That's all realtiem embedded software development is all about: writing code that has predictable timing given your expected inputs and environment. If all you need to do is flip a bit once every 10 minutes with a precision of 1 second while reading 1 bps from a sensor even a full blown Linux distribution on a modern Intel i7 running a Python or Ruby daemon can be considered "realtime". The language doesn't matter as long as you can predict how long everything is going to take in the worst case and your micro[controller/processor] is fast enough to react.
>> For instance, the cascading free behaviour Rust is currently susceptible to can be broken up into a bounded series of free operations interleaved with ordinary program execution. Rust would then be realtime without truly changing its observable behaviour, except its timing in some programs.
You know that's what the Drop trait is for, right? All you have to do is add whatever memory management code you'd have (in your C program) into the trait implementation and your memory deallocation will behave exactly as it would in any other low level language. These low level facilities have been part of the Rust design from the start, they just don't require you to manually call free() by default. That doesn't mean anything in Rust is stopping you from doing so and if you want to, you can opt out of that behavior entirely by providing a blank Drop implementation. After that, literally anything you can do in C you can also do in a Rust unsafe block.
> That doesn't change anything! You're just choosing a garbage collector with a default deterministic pathological case, which is a guarantee you can make about almost any GC by carefully tailoring your memory usage to your scenario and choice of algorithm.
The fact that you don't have to tailor anything is precisely the point. Latency is a property of a runtime, not a language. This has been my point all along. C/C++ or Rust don't guarantee low-latency realtime properties, and introducing tracing GC doesn't guarantee high-latency non-realtime properties.
> You know that's what the Drop trait is for, right? All you have to do is add whatever memory management code you'd have (in your C program) into the trait implementation and your memory deallocation will behave exactly as it would in any other low level language.
Great, but it doesn't guarantee any properties of code you haven't written, so it still can't achieve the global properties I've been talking about.
> C/C++ or Rust don't guarantee low-latency realtime properties, and introducing tracing GC doesn't guarantee high-latency non-realtime properties.
We completely agree.
> Great, but it doesn't guarantee any properties of code you haven't written, so it still can't achieve the global properties I've been talking about.
How is this any different from C/C++? They do not give you any guarantees that Rust takes away in this regard. Any library that uses Box::new or vec! is exactly the same as a C library that calls malloc/free internally and you can implement the same heap allocation free algorithms in Rust as you can in C/C++.
I don't understand what global properties you expect a low level systems language to guarantee. They definitely can't guarantee that code you haven't written doesn't heap allocate, you have to check that they don't call malloc/free yourself.
> Your runtime simply needs to bound the amount of reclamation work done at any given time.
Wouldn't this transform the problem into a "no more predictable maximum memory usage" problem? As you can't really know if and when your GC will keep up it with the amount of work to do.
Possibly, but maximum memory usage is rarely predictable anyway. I expect it might be even less predictable than maximum latency.
However, it may still be possible to conservatively bound your maximum memory usage too, as long as your reclamation-work phase keeps up with your program's allocation rate, then you achieve a steady-state.
Suppose some amount of reclamation is done on malloc(), a tunable parameter could measure the ratio of allocation speed of the running program and amount of unreclaimed garbage. This ratio would control how much reclamation work to do before returning from malloc() so you can fall into steady-state.
> Possibly, but maximum memory usage is rarely predictable anyway. I expect it might be even less predictable than maximum latency.
Well, if you don't need a bound on memory usage you can just never deallocate.
> Suppose some amount of reclamation is done on malloc(), a tunable parameter could measure the ratio of allocation speed of the running program and amount of unreclaimed garbage. This ratio would control how much reclamation work to do before returning from malloc() so you can fall into steady-state.
Sure, but that doesn't guarantee anything about what your maximum spikes are going to be. You can have a firm bound on memory consumption or a firm bound on latency, but you can't get both without doing some serious application-specific work.
I keep seeing this latency claim about GC, but it would be trivial to solve with free if it were actually a problem: just add freed objects to a list and incrementally free over time to achieve whatever latency guarantees you wish.
The reason why no malloc/free implementations that I'm aware of actually do this is that the latency of freeing isn't a problem in practice.
> The reason why no malloc/free implementations that I'm aware of actually do this is that the latency of freeing isn't a problem in practice.
Partly, and the other part is that it degrades allocation performance for the majority of non-problematic programs, which is what most people actually focus on.
But if we're being fair, latency of tracing GC isn't a problem for most programs either. So latency is largely a red herring, except when it's not, and you had better know when it's not, regardless of whether you're using C/C++/Rust or a runtime with tracing GC.
It's worth mentioning that there are several strategies for avoiding cascading deallocations like arenas or arena-backed graph abstractions. For example:
Indeed, if you can live with the wasted memory use of objects outliving their conceptual lifetime, regions/arenas are a good solution.
Note however that you'd probably still have to run destructors when destroying an arena (to free file handles for instance), so you can still see high latency. With an arena you can perhaps schedule this better though.
> if you can live with the wasted memory use of objects outliving their conceptual lifetime, regions/arenas are a good solution.
If you can live with the wasted memory use of objects outliving their conceptual lifetime, garbage collectors can be a good solution too.
Not that that's a bad thing for many use cases, but your above comment implies a comparison between Rust and GC. I think the quoted critique falls down a bit when Rust lets you opt-in to generational-esque GC-ish behavior with a very similar downside to what you'd get from a GC.
> Lifetimes are compile-time only and do not do any reference counting.
I never said they did, I said lifetimes and reference counting both have this pathological case.
C also doesn't provide latency guarantees, as the same pathological programs can exist in C as well. It's a total myth that you need C in realtime domains due to "latency".
Maximum pause times are a property of a particular runtime, not a language.
> With GC you are not in control so there are fewer choices. You will not be able to make guarantees.
Not true. Hard and soft realtime GCs with sub-microsecond latencies exist. Latency is a property of a runtime, not of manual vs. automatic storage reclamation.
"No latency" is a fiction. The only question of any relevance is how much latency is tolerable for a given domain. And describing latency in worst-case timings is standard, so I understand latency just fine thanks.
What is the pathological case with lifetimes? You're saying it takes "time proportional to the number of dead objects to free", but as the parent said, lifetimes are a compile-time construct, so they have no runtime properties.
(I'm not saying that for sure there are none, I'm saying that it seems like you're talking about refcounting only, the lifetime bit is unclear to me.)
I suspect they're referring to graphs of Drop implementors, based on the sibling thread. If you for some reason have a linked-sea-of-nodes data structure that has to traverse itself on drop, that can behave similarly to dropping an Rc graph, though it still doesn't use lifetimes.
Yes, C/C++ would also have this problem. The point I was trying to make is that incrementality/latency is a property of a runtime. If your program has deep ownership graphs, any kind of naive reclamation procedure is going to have high latency, even if it's written in C/C++.
Wait, why would you have to do that? Most ownership is determininstically resolved at compile time, so you can know exactly when a resource will be freed. What you do have to know is about the rare refcounted variable, and what edge cases require ownership checking at runtime.
The latency problem isn't caused by determining what to free, the latency problem is caused by actually freeing. Imagine an array with 2^31 pointers, and now fill it with with 2^31 distinct pointers to the remainder of the 2^32 bit address space. When that array goes out of scope, you can now enjoy 2^31 individual free operations, because reclamation for Rust lifetimes and reference counting are both proportional to the number of dead objects (copying collection takes zero time in this case).
If bounded latency is a goal, you have to bound the depth of of your ownership graph if you're working on a platform that doesn't impose global latency properties. C/C++ and Rust do not do this.
This is only true for pure two-space copying collectors, which are rarely used in practice because of the absurd memory overhead. Once you introduce mark/sweep for some portion of the heap (like production GCs do), you reintroduce overhead proportional to the number of dead objects during the sweep phase.
But C has this problem as well. If you've malloc(3)ed an array of 2^31 pointers, each pointing to an object, enjoy your 2^31 free(3)s, or prepare to start leaking RAM.
Yes, C/C++ and Rust have the same problems, as I said elsewhere. My ultimate point is that low latency is a property of a runtime, not a language. Using C/C++ or Rust aren't going to automatically give you bounded latency, and adding tracing GC doesn't automatically take it away.
The language and runtime typically cooperate to provide the operational semantics that developers are looking for. The case of Objective-C is especially interesting in this regard: developers evolved a number of conventions around reference counting, because reference counting allowed for controllable, minimal latency and contributed to a snappy UI. The language gradually absorbed these conventions into the compiler, such that certain patterns of use are part of the language specification (certain method names, basically) and the ARC code is generated for developers.
I believe their point is that an object can own potentially many objects, and when it is dropped it could cause a cascade of dropping. Which may not be expected by someone.
Well, the good news for you then is that there is progress underway to implement Gc-as-a-library in Rust, to give you this option if you need it as well.
The latency of your Rust or C program is "static" in that you can infer it from the program text. This is not actually true of most garbage collected languages. (Erlang, with per thread heaps, is a notable exception.)
> even high-powered hardware can take a "major" hit from a GC pause when your application is extremely latency sensitive.
That's true, but high-powered, abundant-RAM realtime applications can use approaches that are cheaper than Rust's. See, e.g., the interesting work currently being done updating realtime Java[1]. The idea is that memory is composed of a few kinds of lifetimes: eternal, scoped and GCed-heap. Scoped memory is basically nested arenas, and GCed heap contains objects that are used by non-realtime portions of the app (which, even in realtime systems, may comprise the majority of code, especially when the system runs on large servers).
An approach like Rust's, however, is crucial when the application is RAM and/or energy constrained.
If I had infinite free time, I'd love to explore the problem space of implementing interpreters for GC-based languages on top of Rust. It's quite hard to get the concurrency right, and indeed we see a number of major languages that gave up on even trying.
What for? If you have the extra RAM and power for a GC, you don't need Rust for safety. HotSpot's next-gen (JIT) compiler is written in Java and is absolutely amazing.
> If you have the extra RAM and power for a GC, you don't need Rust for safety.
It isn't just about memory; Rust's safety guarantees in combination with RAII also mean that other resources such as mutex locks, open files, etc. also get closed in a deterministic fashion. (I'd argue that this is quite important for locks, but I've ran into hard-to-debug bugs b/c files weren't being closed out until a GC got to them.)
The way I've always viewed it is that RAII is general to all resources; GC only solves memory.
(I'm assuming the comment you're responding to is discussing getting a concurrent GC to work quite right, which isn't fully relevant w/ my reply; but I do think there is more to Rust's safety than just the memory management, which is what I got from your reply. I'd also argue that memory, in particular, is not abundant, both on mobile devices, but also out in the cloud, where it translates directly into cost both from more expensive VM instances, and from me needing to continually tweak the GC's params.)
I was referring to hinkley's idea for using Rust to write GCed VMs. As to other safety features, those are easily added to cheaper GCed languages. Memory is the hard bit, and if you can afford a GC, it is usually cheaper to just use one. As to RAM being costly, I think RAM is one of the few things that is getting very cheap relative to other resources, and GCs require less and less tuning; working hard to avoid a GC when you can afford one seems to me like the mother of all premature optimizations. But I see no point in debating the issue too much. Every company would make its own consideration about which approach is cheaper.
In any event, there are certainly very important use cases that simply cannot afford the power and RAM overhead required by a GC (again -- latency is not an issue; if you have the resources, there are cheap ways of getting extremely low latencies without doing away with a GC) and those use cases would benefit tremendously from a safe language.
I was in fact thinking of getting concurrent GC to work right. [edit] but also concurrency in general. Global Interpreter Locks when even my laptop has 8 cores?
I also agree that the free memory lunch is going to be over for a while. Java in particular is going to lose out in the container space. I don't think it's an accident that they've suddenly begun taking memory footprint very seriously. They have to.
What other programming languages do you know of that have an 'absolutely amazing' GC implementation? Wouldn't you like that answer to be 'lots'?
The Java team has worked a lot longer and a lot harder on this problem than pretty much everyone else, and even they hit a wall at 1GB. One that took a dreadfully long time to overcome (so long in fact, that it contributed to me being an ex Java developer)
Java has quite a few very good GCs, some in OpenJDK, some by Oracle, and one by Azul. Quite a few of them don't have a 1GB wall. They will become even better when Java finally gets value types and the GC won't have to work hard to do stuff it doesn't have to (this is why Go has decent GC performance even though its GC isn't very sophisticated). In any event, I don't see how Rust can make the work any easier. Coming up with that algorithm is 98% of the job.
Lots of languages do have good GC because they run on the JVM. OTOH, we don't know how hard it is to write similar kinds of applications in Rust. Good things take a while to get right -- Java has taken a while, and Rust has, too. It will be a while yet until Java has a GC that everyone likes, and it will be a while yet until Rust is fully fleshed out and its strengths and weaknesses understood. My personal opinion is that the two approaches are complementary, each being superior in a different domain.
That's not true in general. I've used realtime Java in a safety critical hard realtime application (running on a large server), with strong deadline guarantees (we're talking microsecond range). If you have the power and RAM to spare, the predictability issue is more cheaply solved with the approach I mentioned above (by cheaply I mean in terms of development costs; it is more costly than "plain" GC in terms of effort, but still cheaper than the Rust approach).
The only real cost of GC these days is RAM (and power).
Sun's Java Real-Time System. It is no longer supported, AFAIK, and I don't know which RT JVM the project has switched to because I'm no longer there (my guess would be IBM's).
Real-time GC's exist. Look up Aonix's Java stuff for what embedded or predictable apps do. Or JamaicaVM below. For enterprise, Azul has some amazing GC tech plus Java CPU's (Vega's).
However that's not something that is automatically solved by manual memory management. Using malloc/free on a desktop OS does also not provide a predictable runtime behavior, although unexpected pauses might be smaller than with most GCs.
The safest bet for predictable memory management and latency is the approach that is used by lots of embedded and realtime software: Don't allocate at all. Or at least don't do it in critical phases.
I think this is the point that is obscured when discussing "manual memory management" vs "GC" languages and just focusing on the behaviour/life-cycle of an individual allocation: the former generally provide tools and features that make easier/more natural to avoid allocations, whereas the latter makes the assumption that allocation is usually OK (which is, of course, a perfectly acceptable trade-off for the domains those languages target).
That's not true, some GC algorithms make code behaviour unpredictable. Real-time tracing GCs exist. Reference counting is GC too, but it too is unpredictable.
Yeah, but even high-powered hardware can take a "major" hit from a GC pause when your application is extremely latency sensitive.
IMO it would be great to get folks who write the enormous base of existing realtime apps driving critical devices everywhere to sit up and take notice of Rust.
EDIT: I mean to say that many of my colleagues who write realtime software dismiss new languages as including GC baggage by default (because so many do!). So, hey, good that the video calls this out.