rust/treedirstate/src/filestore.rs
105	Do we need to set at_end = false right after the seek to 0? So that if any of the lines below (in particular line 118) exit earlier, we aren't left with seek at 0 and at_end == true? Or do we consider the whole store invalid if this returns error?

rust/treedirstate/src/filestore.rs
105	I think in practice if any of this fails we end up with an exception in Python and hg will exit, but we should keep it consistent in case someone later on decides to handle these errors.

rust/treedirstate/src/filestore.rs
105	I think we want to explicitly not depend on process exit for clean up in the design of these new structures. We want daemon support as a first class citizen. Perhaps in this case it means if a caller ever receives an error, it should consider the structure invalid. Is there any convenient short hand for something that can set self.is_invalid = true if a function returns an error? Kinda like what try/catch would allow.

rust/treedirstate/src/filestore.rs
105	Yes, I think errors most likely invalidate the structure. If we're in the middle of doing a `write_full` for example, half the in-memory nodes will have IDs from the old store, and half will have IDs from the new store. `Result::map_err` and `Result::or_else` can be used to perform clean-ups like this. I think at a higher level (maybe in `Dirstate`?) we'll eventually want to have a `map_err(\|e\| { self.invalidate(); e }` on all the public APIs.

rust/treedirstate/src/filestore.rs
28–46	What about using `BufReader` to do the buffering. If you only want to do buffering sometimes you could switch file between being `BufReader<BufWriter<File>>` and `BufWriter<File>` with an enum. That way you could be reasonably sure that the read and write sides are in sync all the time, rather than having to maintain a separate cache.
121–136	Should this invalidate the cache?

rust/treedirstate/src/filestore.rs
121–136	No need. The cache is still valid for all previous blocks (they're immutable), so we can keep it around.

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	3482		Nov 14 2017, 12:39 PM	★	★
Diff 2	3599		Nov 17 2017, 9:45 AM	★	★
Diff 3	3654		Nov 20 2017, 12:20 PM	★	★
Diff 4	3729		Nov 21 2017, 1:35 PM	★	★
Diff 5	3770		Nov 22 2017, 1:14 PM	★	★
Diff 6	3823		Nov 24 2017, 11:53 AM	★	★
Diff 7	3841		Nov 24 2017, 3:18 PM	★	★
Diff 8	3931	rFBHGX3d30572a3984209941747857fe8a99d49c4a82a1	Nov 28 2017, 7:51 AM	★	★

Status	Author	Revision
Closed	mbthomas	D1528 treedirstate: prevent interference with other dirstate implementations
Closed	mbthomas	D1512 treedirstate: de-genericize Dirstate
Closed	mbthomas	D1510 treedirstate: extract serialization methods to separate module
Closed	mbthomas	D1490 treedirstate: add integration tests
Closed	mbthomas	D1434 treedirstate: add configuration options
Closed	mbthomas	D1412 setup: build treedirstate and rusttreedirstate packages
Closed	mbthomas	D1411 distutils_rust: add distutils extension to compile Rust extension modules
Closed	mbthomas	D1410 perftweaks: support treedirstate maps
Closed	mbthomas	D1409 treedirstate: implement casefolding maps for case insensitive filesystems
Closed	mbthomas	D1408 treedirstate: use vlqencoding for numbers
Closed	mbthomas	D1407 treedirstate: auto-repack treedirstate once it reaches 3x its original size
Closed	mbthomas	D1406 treedirstate: implement efficient case collision detection
Closed	mbthomas	D1405 treedirstate: allow absent non-normal sets
Closed	mbthomas	D1404 treedirstate: clear ambiguous times when writing the dirstate
Closed	mbthomas	D1403 treedirstate: better iteration using visitor pattern
Closed	mbthomas	D1402 treedirstate: cache dirstate data when iterating all files
Closed	mbthomas	D1401 treedirstate: add Python linkage
Closed	mbthomas	D1400 treedirstate: add Dirstate
Closed	mbthomas	D1399 treedirstate: add Tree
Closed	mbthomas	D1398 treedirstate: add FileStore
Closed	mbthomas	D1397 treedirstate: add Store and StoreView traits
Closed	mbthomas	D1396 treedirstate: add vecmap implementation
Closed	mbthomas	D1395 treedirstate: create empty Rust project

Diff 3482

rust/treedirstate/src/dirstate.rs

	}			}

	pub fn store_view<'a>(&'a self) -> &'a StoreView {			pub fn store_view<'a>(&'a self) -> &'a StoreView {
	match *self {			match *self {
	Backend::Empty(ref null) => null,			Backend::Empty(ref null) => null,
	Backend::File(ref file) => file,			Backend::File(ref file) => file,
	}			}
	}			}

				pub fn cache(&mut self) -> Result<()> {
				match *self {
				Backend::Empty(ref _null) => Ok(()),
				Backend::File(ref mut file) => file.cache(),
				}
				}
	}			}

	/// A dirstate object. This contains the state of all files in the dirstate, stored in tree			/// A dirstate object. This contains the state of all files in the dirstate, stored in tree
	/// structures, and backed by an append-only store on disk.			/// structures, and backed by an append-only store on disk.
	pub struct Dirstate<T> {			pub struct Dirstate<T> {
	/// The store currently in use by the Dirstate.			/// The store currently in use by the Dirstate.
	store: Backend,			store: Backend,


	/// Get an entry from the tracked tree.			/// Get an entry from the tracked tree.
	pub fn get_tracked<'a>(&'a mut self, name: KeyRef) -> Result<Option<&'a T>> {			pub fn get_tracked<'a>(&'a mut self, name: KeyRef) -> Result<Option<&'a T>> {
	self.tracked.get(self.store.store_view(), name)			self.tracked.get(self.store.store_view(), name)
	}			}

	/// Get the name and state of the first file in the tracked tree.			/// Get the name and state of the first file in the tracked tree.
	pub fn get_first_tracked<'a>(&'a mut self) -> Result<Option<(Key, &'a T)>> {			pub fn get_first_tracked<'a>(&'a mut self) -> Result<Option<(Key, &'a T)>> {
				self.store.cache()?;
	self.tracked.get_first(self.store.store_view())			self.tracked.get_first(self.store.store_view())
	}			}

	/// Get the name and state of the next file in the tracked tree after the named file.			/// Get the name and state of the next file in the tracked tree after the named file.
	pub fn get_next_tracked<'a>(&'a mut self, name: KeyRef) -> Result<Option<(Key, &'a T)>> {			pub fn get_next_tracked<'a>(&'a mut self, name: KeyRef) -> Result<Option<(Key, &'a T)>> {
	self.tracked.get_next(self.store.store_view(), name)			self.tracked.get_next(self.store.store_view(), name)
	}			}

rust/treedirstate/src/filestore.rs

	// Copyright Facebook, Inc. 2017			// Copyright Facebook, Inc. 2017
	//! Implementation of a store using file I/O.			//! Implementation of a store using file I/O.

	use byteorder::{BigEndian, ReadBytesExt, WriteBytesExt};			use byteorder::{ByteOrder, BigEndian, ReadBytesExt, WriteBytesExt};
	use errors::*;			use errors::*;
	use std::borrow::Cow;			use std::borrow::Cow;
	use std::cell::RefCell;			use std::cell::RefCell;
	use std::fs::File;			use std::fs::File;
	use std::fs::OpenOptions;			use std::fs::OpenOptions;
	use std::io::{BufWriter, Read, Seek, SeekFrom, Write};			use std::io::{BufWriter, Read, Seek, SeekFrom, Write};
	use std::path::Path;			use std::path::Path;
	use store::{BlockId, Store, StoreView};			use store::{BlockId, Store, StoreView};

	// File storage format:			// File storage format:
	//			//
	// Header: Magic string: 'appendstore\n'			// Header: Magic string: 'appendstore\n'
	// Version: BigEndian u32 (Current version: 1)			// Version: BigEndian u32 (Current version: 1)
	//			//
	// Entries: Length: BigEndian u32			// Entries: Length: BigEndian u32
	// Data: "Length" bytes of data			// Data: "Length" bytes of data

	const MAGIC: &[u8] = b"appendstore\n";			const MAGIC: &[u8] = b"appendstore\n";
	const MAGIC_LEN: usize = 12;			const MAGIC_LEN: usize = 12;
	const VERSION: u32 = 1;			const VERSION: u32 = 1;
	const HEADER_LEN: usize = MAGIC_LEN + 4;			const HEADER_LEN: usize = MAGIC_LEN + 4;

	/// Implementation of a store using file I/O to read and write blocks to a file.			/// Implementation of a store using file I/O to read and write blocks to a file.
	pub struct FileStore {			pub struct FileStore {
	/// The underlying file. This is stored in a RefCell so that we can seek during reads.			/// The underlying file. This is stored in a RefCell so that we can seek during reads.
	file: RefCell<BufWriter<File>>,			file: RefCell<BufWriter<File>>,

	/// The position in the file to which new items will be written.			/// The position in the file to which new items will be written.
	position: u64,			position: u64,

	/// Whether the file handle is currently at the end of the file. This is used to avoid seeking			/// Whether the file handle is currently at the end of the file. This is used to avoid seeking
	/// to the end each time a block is written, as seeking causes the BufWrite to flush, which			/// to the end each time a block is written, as seeking causes the BufWrite to flush, which
	/// hurts performance. This is stored in a RefCell so that we can seek away from the end			/// hurts performance. This is stored in a RefCell so that we can seek away from the end
	/// during reads.			/// during reads.
	at_end: RefCell<bool>,			at_end: RefCell<bool>,

	/// True if the file is read-only.			/// True if the file is read-only.
	read_only: bool,			read_only: bool,

				/// Cache of data loaded from disk. Used when iterating over the whole dirstate.
				cache: Option<Vec<u8>>,
	}			}
				jsgfUnsubmitted Not Done What about using `BufReader` to do the buffering. If you only want to do buffering sometimes you could switch file between being `BufReader<BufWriter<File>>` and `BufWriter<File>` with an enum. That way you could be reasonably sure that the read and write sides are in sync all the time, rather than having to maintain a separate cache. jsgf: What about using `BufReader` to do the buffering. If you only want to do buffering sometimes…

	impl FileStore {			impl FileStore {
	/// Create a new FileStore, overwriting any existing file.			/// Create a new FileStore, overwriting any existing file.
	pub fn create<P: AsRef<Path>>(path: P) -> Result<FileStore> {			pub fn create<P: AsRef<Path>>(path: P) -> Result<FileStore> {
	let mut file = BufWriter::new(OpenOptions::new()			let mut file = BufWriter::new(OpenOptions::new()
	.read(true)			.read(true)
	.write(true)			.write(true)
	.create(true)			.create(true)
	.truncate(true)			.truncate(true)
	.open(path)?);			.open(path)?);
	file.write(MAGIC)?;			file.write(MAGIC)?;
	file.write_u32::<BigEndian>(VERSION)?;			file.write_u32::<BigEndian>(VERSION)?;
	Ok(FileStore {			Ok(FileStore {
	file: RefCell::new(file),			file: RefCell::new(file),
	position: HEADER_LEN as u64,			position: HEADER_LEN as u64,
	at_end: RefCell::new(true),			at_end: RefCell::new(true),
	read_only: false,			read_only: false,
				cache: None,
	})			})
	}			}

	/// Open an existing FileStore.			/// Open an existing FileStore.
	pub fn open<P: AsRef<Path>>(path: P) -> Result<FileStore> {			pub fn open<P: AsRef<Path>>(path: P) -> Result<FileStore> {
	let mut read_only = false;			let mut read_only = false;
	let file = OpenOptions::new().read(true).write(true).open(&path).or_else(\|_e\| {			let file = OpenOptions::new().read(true).write(true).open(&path).or_else(\|_e\| {
	read_only = true;			read_only = true;
	// by seeking to the end.			// by seeking to the end.
	let position = file.seek(SeekFrom::End(0))?;			let position = file.seek(SeekFrom::End(0))?;

	Ok(FileStore {			Ok(FileStore {
	file: RefCell::new(file),			file: RefCell::new(file),
	position,			position,
	at_end: RefCell::new(true),			at_end: RefCell::new(true),
	read_only,			read_only,
				cache: None,
	})			})
	}			}

				pub fn cache(&mut self) -> Result<()> {
				if self.cache.is_none() {
				let file = self.file.get_mut();
				file.flush()?;
				file.seek(SeekFrom::Start(0))?;
				durhamUnsubmitted Not Done Do we need to set at_end = false right after the seek to 0? So that if any of the lines below (in particular line 118) exit earlier, we aren't left with seek at 0 and at_end == true? Or do we consider the whole store invalid if this returns error? durham: Do we need to set at_end = false right after the seek to 0? So that if any of the lines below…
				mbthomasAuthorUnsubmitted Not Done I think in practice if any of this fails we end up with an exception in Python and hg will exit, but we should keep it consistent in case someone later on decides to handle these errors. mbthomas: I think in practice if any of this fails we end up with an exception in Python and hg will exit…
				durhamUnsubmitted Not Done I think we want to explicitly not depend on process exit for clean up in the design of these new structures. We want daemon support as a first class citizen. Perhaps in this case it means if a caller ever receives an error, it should consider the structure invalid. Is there any convenient short hand for something that can set self.is_invalid = true if a function returns an error? Kinda like what try/catch would allow. durham: I think we want to explicitly not depend on process exit for clean up in the design of these…
				mbthomasAuthorUnsubmitted Not Done Yes, I think errors most likely invalidate the structure. If we're in the middle of doing a `write_full` for example, half the in-memory nodes will have IDs from the old store, and half will have IDs from the new store. `Result::map_err` and `Result::or_else` can be used to perform clean-ups like this. I think at a higher level (maybe in `Dirstate`?) we'll eventually want to have a `map_err(\|e\| { self.invalidate(); e }` on all the public APIs. mbthomas: Yes, I think errors most likely invalidate the structure. If we're in the middle of doing a…
				let mut buffer = Vec::with_capacity(self.position as usize);
				unsafe {
				// This is safe as we've just allocated the buffer and are about to read into it.
				buffer.set_len(self.position as usize);
				}
				file.get_mut().read_exact(buffer.as_mut_slice())?;
				file.seek(SeekFrom::Start(self.position))?;
				*self.at_end.get_mut() = true;
				self.cache = Some(buffer);
				}
				Ok(())
				}
	}			}

	impl Store for FileStore {			impl Store for FileStore {
	fn append(&mut self, data: &[u8]) -> Result<BlockId> {			fn append(&mut self, data: &[u8]) -> Result<BlockId> {
	if self.read_only {			if self.read_only {
	unimplemented!();			unimplemented!();
	}			}
	let id = self.position as BlockId;			let id = self.position as BlockId;
	let file = self.file.get_mut();			let file = self.file.get_mut();
	let at_end = self.at_end.get_mut();			let at_end = self.at_end.get_mut();
	if !*at_end {			if !*at_end {
	file.seek(SeekFrom::Start(self.position))?;			file.seek(SeekFrom::Start(self.position))?;
	*at_end = true;			*at_end = true;
	}			}
	file.write_u32::<BigEndian>(data.len() as u32)?;			file.write_u32::<BigEndian>(data.len() as u32)?;
	self.position += 4;			self.position += 4;
	self.position += file.write(data)? as u64;			self.position += file.write(data)? as u64;
	Ok(id)			Ok(id)
	}			}
				jsgfUnsubmitted Not Done Should this invalidate the cache? jsgf: Should this invalidate the cache?
				mbthomasAuthorUnsubmitted Not Done No need. The cache is still valid for all previous blocks (they're immutable), so we can keep it around. mbthomas: No need. The cache is still valid for all previous blocks (they're immutable), so we can keep…

	fn flush(&mut self) -> Result<()> {			fn flush(&mut self) -> Result<()> {
	self.file.get_mut().flush()?;			self.file.get_mut().flush()?;
	Ok(())			Ok(())
	}			}
	}			}

	impl StoreView for FileStore {			impl StoreView for FileStore {
	fn read<'a>(&'a self, id: BlockId) -> Result<Cow<'a, [u8]>> {			fn read<'a>(&'a self, id: BlockId) -> Result<Cow<'a, [u8]>> {
	// Check the ID is in range.			// Check the ID is in range.
	if id < HEADER_LEN \|\| id as u64 >= self.position {			if id < HEADER_LEN \|\| id as u64 >= self.position {
	bail!(ErrorKind::InvalidStoreId(id));			bail!(ErrorKind::InvalidStoreId(id));
	}			}

				if let Some(ref cache) = self.cache {
				if (id as u64) < cache.len() as u64 {
				if (id as u64) > cache.len() as u64 - 4 {
				// The ID falls in the last 3 bytes of the cache. This is invalid.
				bail!(ErrorKind::InvalidStoreId(id));
				}
				let start = id as usize + 4;
				let size = BigEndian::read_u32(&cache[id as usize..start]) as usize;
				if size as u64 > cache.len() as u64 - start as u64 {
				// The stored size of this block exceeds the number of bytes left in the
				// cache. We must have been given an invalid ID.
				bail!(ErrorKind::InvalidStoreId(id));
				}
				return Ok(Cow::from(&cache[start..start + size]))
				}
				}

	// Get mutable access to the file, and seek to the right location.			// Get mutable access to the file, and seek to the right location.
	let mut file = self.file.borrow_mut();			let mut file = self.file.borrow_mut();
	file.seek(SeekFrom::Start(id as u64))?;			file.seek(SeekFrom::Start(id as u64))?;
	*self.at_end.borrow_mut() = false;			*self.at_end.borrow_mut() = false;

	// Read the block of data from the file.			// Read the block of data from the file.
	let size = file.get_mut().read_u32::<BigEndian>()?;			let size = file.get_mut().read_u32::<BigEndian>()?;
	if size as u64 > self.position - id as u64 {			if size as u64 > self.position - id as u64 {

			Path	Packages
M			rust/treedirstate/src/dirstate.rs (8 lines)
M			rust/treedirstate/src/filestore.rs (42 lines)

This is an archive of the discontinued Mercurial Phabricator instance.

treedirstate: cache dirstate data when iterating all files
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 3482

rust/treedirstate/src/dirstate.rs

rust/treedirstate/src/filestore.rs

This is an archive of the discontinued Mercurial Phabricator instance.

treedirstate: cache dirstate data when iterating all filesClosedPublic

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 3482

rust/treedirstate/src/dirstate.rs

rust/treedirstate/src/filestore.rs

treedirstate: cache dirstate data when iterating all files
ClosedPublic

Revision Contents
Changeset List