Hi! I do hope everyone is doing well. Pretty much as the title says, I'm very confused with the Cow
smart pointer usage and how it works and when one should use it.
Citing the documentation, it says:
The type
Cow
is a smart pointer providing clone-on-write functionality: it can enclose and provide immutable access to borrowed data, and clone the data lazily when mutation or ownership is required
It kindly provides one of the following examples for its into_owned
method:
// Extracts the owned data.
// Clones the data if it is not already owned.
use std::borrow::Cow;
let s = "Hello world!";
let cow = Cow::Borrowed(s);
assert_eq!(
cow.into_owned(),
String::from(s)
);
So, from my understanding, it borrows a variable but clones it whenever mutation occurs.
I have the following questions:
clone()
in a type?Cow
differentiates from RefCell
's borrow()
?Cow
might be useful?My apologies if my concern do not make sense, I'm doing my best to understand smart pointers. I'd deeply appreciate any guidance. Thank you in advance.
Much love!
My main use case for `Cow` is avoiding (string) copies. Consider
fn how_many_potions(num: usize) -> std::borrow::Cow<'static, str> {
match num {
0 => "out of potions".into(),
1 => "last potion".into(),
_ => format!("{} potions remaining", num).into()
}
}
Essentially, we want to return a String
here, but there are cases in which we can do better because we actually have a &'static str
containing the data, so we don't need to allocate and copy a brand new String
for this purpose.
This is exactly what Cow
is for.
Does this example make sense to you?
Makes sense. Thanks for the example, man!
I’m very ignorant on rust, it doesn’t do any return value copy elision?
It (well, the LLVM) can, but it is not guaranteed like in C++. And anyway the function couldn't just return a String
that points to 'static
data, because String
's invariant is that it owns its contents.
This doesn't have to do with copy elision. String content is stored in the heap. Rust doesn't have copy constructors. Moves are always a memcpy (barring compiler optimization), which makes these optimizations transparent outside of digging into pointer values. This does pose some challenges for things like self referencing structs since moving them would invalidate their references. In those cases references have to be replaced with relative offsets (which gets into a mess)
See also https://doc.rust-lang.org/std/pin/index.html which gets into the idea of unmovable values to allow safe self reference
I'm going to paraphrase what I picked up from other comments here as I've also been struggling to understand this. Please let me know if this is wrong:
It's useful for when you maybe need to mutate a string. If your function returns a &str then you couldn't mutate it, if it returns a String then you have to allocate memory for no reason. A Cow allows the best of both worlds. It allows expressions to type check because it's an enum.
Just like Result is maybe success, Option is maybe exists, Cow is maybe mutation-needed.
That is correct.
Note that any enum with 2 variants can do what Cow does, it's not magic.
It is just specialized to work with a borrowed variant and it's owned counterpart so if you have a Cow<str>
and you call to_mut()
on it it will know automagically to allocate a String
and give you a mutable reference to it.
This is one of the main reason quick-xml / quick-protobuf are fast too.
Thanks everyone for the kind responses!
I don't understand this question
Cow allows sharing data. If I have 10 Cow's referencing the static string of data I only have 1 copy of that data. With RefCell you'll have 10 copies assuming that it's a RefCell<String>
versus Cow<'static, str>
In a card game engine I have skills which are an enum. On instances there's a mapping HashMap<Event, Cow<'static, [Skill]>>
. There's a statically generated list of cards which have a &'static [Skill]
which is easily inserted into this HashMap without copying the list. If the list is modified by some effect then it gets converted to an owned Cow
. I tend to have lots of clones of game state, so this provides value in two ways: cloning a reference is cheap & a cloned reference doesn't multiply the memory usage by the size of the slice. It's also useful in effects where I add a skill as a plain Cow::from(&[Skill::constantthing])
Another example of how Cow could be used: imagine you're parsing JSON. If the Cow only needs to live as long as the JSON string, then you could have Cow<'a, str>
reference substrings of the JSON string. For "asdf"
this works. But if you have "qu\"ote"
then you'll want the Cow to be an owned String qu"ote
since you don't want the reference qu\"ote
. This allows you to be nearly zero copy
#2 Really? I thought RefCell is a reference type, so why would it copy?
RefCell
contains T: https://doc.rust-lang.org/src/core/cell.rs.html#571
It's a reference type in that compared to Cell
it has an interface to borrow, so it's useful for types which don't implement Copy
It's copies if you have 10 RefCell
s. The point of Cow is that if I have Cow::Borrowed
it won't change. They're different things. RefCell is interior mutability, Cow is possibly-shared possibly-owned
When you clone a RefCell
it clones the interior value
Ah I see. So if you have a Cow::Owned then cloning would still copy just like RefCell, but the benefit is that cloning a Cow::Borrowed won't copy?
Yes
Cool, thank you.
that cloning a Cow::Borrowed won't copy?
It would copy a reference (which is cheap to do).
Cow
is... not misleadingly named, exactly, because the copy-on-write functionality is important and forms a large part of the interface. But I feel like it sells short the true power of the type.Cow is essentially* this:
It's an enum that lets you choose whether you want the borrowed or owned version of a type, and it's in cases where that choice is important that the type is most useful / powerful.
My favorite example is a hypothetical JSON parser. Consider JSON, read from a file into a
Vec<u8>
, containing JSON strings. Most of the time the actual bytes in the file of the string are a perfectly accurate representation of how Rust will treat the string, which means a parser could return a&str
representing a slice of the original file. On the other hand, if there are any escaped characters (\n
, etc), that won't suffice; the JSON parser needs to turn those into real code points in a string. This means that the borrowed version won't suffice, you need a realString
to be built. This is whereCow
shines; it allows a rust function to say "I'm going to return either a borrowed form of this data or an owned form, I'm not sure which.A great real-life example is
shell_escape::escape
. This pretty much does exactly what I just described: it processes a string to make it safe for use in a shell. Most of the time, the string can be returned unchanged, but if any changes need to be made (adding escaped spaces, for example) then it needs to return aString
containing the corrections.Like I said, I don't want to undersell the usefulness of the copy-on-write functionality. But when considering types for your use case,
Cow
should be what you reach for to express "either a reference or an owned type".* The real definition of Cow is slightly more general; it allows you to express things like
Borrowed(&str) / Owned(String)
, whereas my example would require something likeBorrowed(&String) / Owned(String)
. But the idea is the same. Similarly, to answer your other questions,ToOwned
is like the slightly more general version ofClone
.Clone
requires the new object to have an identical type as the old one, whereasToOwned
can represent a&str
→String
or&[u8]
→Vec<u8>
, in addition to inheriting the&T
→T
behavior ofClone
.Yes, the only shame is that to_lower() and other maybe-mutating string functions don’t use this approach.
Is there a crate that fixes that somehow? (On mobile RN, but I'd like to inquire for later)
If you use to_lower() just to case insensitively compare strings the unicase crate can do it without allocation.
Thanks! :)
Thanks for this explanation. A few years with rust and never bothered with Cow. Working on a file format and parser. Converting it to make better use of serdes zero copy but looks like this is another place I can gain some better memory usage (and thus speed for my use case).
Thank you so much for the explanation, It's clear now :-)