This is an archive of the discontinued Mercurial Phabricator instance.

procutil: support for obtaining an importlib.abc.ResourceReader
Changes PlannedPublic

Authored by indygreg on Nov 14 2019, 10:58 PM.

Details

Reviewers
None
Group Reviewers
hg-reviewers
Summary

Python 3.7's importing mechanism offers a new "resource reader"
API that allows module loaders to provide an object which can be
used for obtaining information and data about "resources" in a
Python package. This API is the new recommended way for modules
to load data for non-module entities, such as text files and
other support files residing next to sources. This functionality
is similar to pkg_resources (which has existed for a long time)
and importlib.abc.ResourceLoader, which was introduced in Python 3
and is deprecated in favor of ResourceReader.

Using a "resource reader" is a superior approach to pkg_resources
because it is newer and faster. And both are superior to file
based filesystem reading because they allow the storage of
resources in places that aren't the filesystem. This allows things
like embedding resources in zip files or single file binaries.

This commit introduces a "resource reader" API into the procutil
module.

We introduce an importlib.abc.ResourceReader compatible class
which can load resources from the filesystem.

We introduce another class that wraps an existing ResourceReader
so callers don't need to worry about bytes/str differences.

Finally, there is a new "resourcereader()" that returns a
ResourceReader for a given Python package. The idea is that
code will call `pycompat.resourcereader(__package__)` (or
similar) to obtain an object conforming to
importlib.abc.ResourceReader then from there on, things will
look like they would as if the code were Python 3.7 native.

The ultimate goal is to remove the dependence on file
and util.datapath so non-module data file access is abstract
and doesn't need to be serviced by a traditional filesystem.

The new code will return bad results for py2exe and macOS
application builds because resource files live in immediate
subdirectories of the binary instead of next to the source.
This will be addressed in future commits.

  1. no-check-commit because we must use foo_bar naming

Diff Detail

Repository
rHG Mercurial
Branch
default
Lint
No Linters Available
Unit
No Unit Test Coverage

Event Timeline

indygreg created this revision.Nov 14 2019, 10:58 PM
indygreg planned changes to this revision.Nov 14 2019, 11:01 PM

@martinvonz do you want to pick up this series or do you want me to rebase it on top of packageutil? If you leave it up to me, I'm not sure when I'll get around to it. (I'd prefer to focus my time on shoring up PyOxidizer.)

If you do pick this up, an alternative to this approach would be to vendor importlib_resources (https://importlib-resources.readthedocs.io/en/latest/index.html) or consider using its code/approach instead of rolling so much of a custom solution. That being said, I feel like I already wrote enough of the code to support the modern resources API and Mercurial would require sufficient changes for things to work in the domain of bytes that importing importlib_resources doesn't seem worthwhile.

If you do take this series, something to consider for templates is the implication for users wanting to edit templates. If templates are embedded in the binary, how do users define their own custom versions of the build-in templates? We already have a permissions problem when the provided templates are read-only and the user can't edit them in place. But with PyOxidizer, if templates are embedded in the binary, the user won't even have access to their raw content! We would likely need to supply an hg debugextracttemplates command or something to allow the user to obtain a copy of the embedded templates.