r/rust Confused with how std::borrow:Cow works and why it's useful.

posted by u/blureglades on 12 Dec 2020

Hi! I do hope everyone is doing well. Pretty much as the title says, I'm very confused with the Cow smart pointer usage and how it works and when one should use it.

Citing the documentation, it says:

The type Cow is a smart pointer providing clone-on-write functionality: it can enclose and provide immutable access to borrowed data, and clone the data lazily when mutation or ownership is required

It kindly provides one of the following examples for its into_owned method:

// Extracts the owned data.
// Clones the data if it is not already owned.
use std::borrow::Cow;

let s = "Hello world!"; 
let cow = Cow::Borrowed(s);  

assert_eq!(
    cow.into_owned(),
    String::from(s)
); 

So, from my understanding, it borrows a variable but clones it whenever mutation occurs.

I have the following questions:

  1. Is this another way of doing clone() in a type?
  2. How Cow differentiates from RefCell's borrow()?
  3. In which other practical cases Cow might be useful?

My apologies if my concern do not make sense, I'm doing my best to understand smart pointers. I'd deeply appreciate any guidance. Thank you in advance.

Much love!

posted by u/Lucretiel on 12 Dec 2020 118▲

Cow is... not misleadingly named, exactly, because the copy-on-write functionality is important and forms a large part of the interface. But I feel like it sells short the true power of the type.

Cow is essentially* this:

enum Cow<T> {
    Borrowed(&T),
    Owned(T),
}

It's an enum that lets you choose whether you want the borrowed or owned version of a type, and it's in cases where that choice is important that the type is most useful / powerful.

My favorite example is a hypothetical JSON parser. Consider JSON, read from a file into a Vec<u8>, containing JSON strings. Most of the time the actual bytes in the file of the string are a perfectly accurate representation of how Rust will treat the string, which means a parser could return a &str representing a slice of the original file. On the other hand, if there are any escaped characters (\n, etc), that won't suffice; the JSON parser needs to turn those into real code points in a string. This means that the borrowed version won't suffice, you need a real String to be built. This is where Cow shines; it allows a rust function to say "I'm going to return either a borrowed form of this data or an owned form, I'm not sure which.

A great real-life example is shell_escape::escape. This pretty much does exactly what I just described: it processes a string to make it safe for use in a shell. Most of the time, the string can be returned unchanged, but if any changes need to be made (adding escaped spaces, for example) then it needs to return a String containing the corrections.

Like I said, I don't want to undersell the usefulness of the copy-on-write functionality. But when considering types for your use case, Cow should be what you reach for to express "either a reference or an owned type".

* The real definition of Cow is slightly more general; it allows you to express things like Borrowed(&str) / Owned(String), whereas my example would require something like Borrowed(&String) / Owned(String). But the idea is the same. Similarly, to answer your other questions, ToOwned is like the slightly more general version of Clone. Clone requires the new object to have an identical type as the old one, whereas ToOwned can represent a &strString or &[u8]Vec<u8>, in addition to inheriting the &TT behavior of Clone.

posted by u/mqudsi on 12 Dec 2020 29▲

Yes, the only shame is that to_lower() and other maybe-mutating string functions don’t use this approach.

posted by u/runiq on 12 Dec 2020 3▲

Is there a crate that fixes that somehow? (On mobile RN, but I'd like to inquire for later)

posted by u/CoronaLVR on 12 Dec 2020 13▲

If you use to_lower() just to case insensitively compare strings the unicase crate can do it without allocation.

posted by u/runiq on 12 Dec 2020 1▲

Thanks! :)

posted by u/bahwi on 12 Dec 2020 9▲

Thanks for this explanation. A few years with rust and never bothered with Cow. Working on a file format and parser. Converting it to make better use of serdes zero copy but looks like this is another place I can gain some better memory usage (and thus speed for my use case).

posted by u/blureglades on 12 Dec 2020 5▲

Thank you so much for the explanation, It's clear now :-)

posted by u/ImYoric on 12 Dec 2020 31▲

My main use case for `Cow` is avoiding (string) copies. Consider

fn how_many_potions(num: usize) -> std::borrow::Cow<'static, str> {
    match num {
        0 => "out of potions".into(),
        1 => "last potion".into(),
        _ => format!("{} potions remaining", num).into()
    }
}

Essentially, we want to return a String here, but there are cases in which we can do better because we actually have a &'static str containing the data, so we don't need to allocate and copy a brand new String for this purpose.

This is exactly what Cow is for.

Does this example make sense to you?

posted by u/blureglades on 12 Dec 2020 4▲

Makes sense. Thanks for the example, man!

posted by u/LonelyStruggle on 12 Dec 2020 3▲

I’m very ignorant on rust, it doesn’t do any return value copy elision?

posted by u/Sharlinator on 12 Dec 2020 15▲

It (well, the LLVM) can, but it is not guaranteed like in C++. And anyway the function couldn't just return a String that points to 'static data, because String's invariant is that it owns its contents.

posted by u/__s on 12 Dec 2020 8▲

This doesn't have to do with copy elision. String content is stored in the heap. Rust doesn't have copy constructors. Moves are always a memcpy (barring compiler optimization), which makes these optimizations transparent outside of digging into pointer values. This does pose some challenges for things like self referencing structs since moving them would invalidate their references. In those cases references have to be replaced with relative offsets (which gets into a mess)

See also https://doc.rust-lang.org/std/pin/index.html which gets into the idea of unmovable values to allow safe self reference

posted by u/OS6aDohpegavod4 on 12 Dec 2020 7▲

I'm going to paraphrase what I picked up from other comments here as I've also been struggling to understand this. Please let me know if this is wrong:

It's useful for when you maybe need to mutate a string. If your function returns a &str then you couldn't mutate it, if it returns a String then you have to allocate memory for no reason. A Cow allows the best of both worlds. It allows expressions to type check because it's an enum.

Just like Result is maybe success, Option is maybe exists, Cow is maybe mutation-needed.

posted by u/CoronaLVR on 12 Dec 2020 4▲

That is correct.

Note that any enum with 2 variants can do what Cow does, it's not magic.

It is just specialized to work with a borrowed variant and it's owned counterpart so if you have a Cow<str> and you call to_mut() on it it will know automagically to allocate a String and give you a mutable reference to it.

posted by u/llogiq on 12 Dec 2020 5▲

Look at holy Cow.

posted by u/tafia97300 on 12 Dec 2020 2▲

This is one of the main reason quick-xml / quick-protobuf are fast too.

posted by u/blureglades on 12 Dec 2020 1▲

Thanks everyone for the kind responses!

posted by u/__s on 12 Dec 2020 0▲
  1. I don't understand this question

  2. Cow allows sharing data. If I have 10 Cow's referencing the static string of data I only have 1 copy of that data. With RefCell you'll have 10 copies assuming that it's a RefCell<String> versus Cow<'static, str>

  3. In a card game engine I have skills which are an enum. On instances there's a mapping HashMap<Event, Cow<'static, [Skill]>>. There's a statically generated list of cards which have a &'static [Skill] which is easily inserted into this HashMap without copying the list. If the list is modified by some effect then it gets converted to an owned Cow. I tend to have lots of clones of game state, so this provides value in two ways: cloning a reference is cheap & a cloned reference doesn't multiply the memory usage by the size of the slice. It's also useful in effects where I add a skill as a plain Cow::from(&[Skill::constantthing])

Another example of how Cow could be used: imagine you're parsing JSON. If the Cow only needs to live as long as the JSON string, then you could have Cow<'a, str> reference substrings of the JSON string. For "asdf" this works. But if you have "qu\"ote" then you'll want the Cow to be an owned String qu"ote since you don't want the reference qu\"ote. This allows you to be nearly zero copy

posted by u/OS6aDohpegavod4 on 12 Dec 2020 2▲

#2 Really? I thought RefCell is a reference type, so why would it copy?

posted by u/__s on 12 Dec 2020 1▲

RefCell contains T: https://doc.rust-lang.org/src/core/cell.rs.html#571

It's a reference type in that compared to Cell it has an interface to borrow, so it's useful for types which don't implement Copy

It's copies if you have 10 RefCells. The point of Cow is that if I have Cow::Borrowed it won't change. They're different things. RefCell is interior mutability, Cow is possibly-shared possibly-owned

When you clone a RefCell it clones the interior value

posted by u/OS6aDohpegavod4 on 12 Dec 2020 2▲

Ah I see. So if you have a Cow::Owned then cloning would still copy just like RefCell, but the benefit is that cloning a Cow::Borrowed won't copy?

posted by u/__s on 12 Dec 2020 1▲

Yes

posted by u/OS6aDohpegavod4 on 12 Dec 2020 1▲

Cool, thank you.

posted by u/sellibitze on 12 Dec 2020 1▲

that cloning a Cow::Borrowed won't copy?

It would copy a reference (which is cheap to do).