diff --git a/contrib/clang-format-ignorelist b/contrib/clang-format-ignorelist
--- a/contrib/clang-format-ignorelist
+++ b/contrib/clang-format-ignorelist
@@ -62,6 +62,11 @@
 contrib/python-zstandard/zstd/compress/zstd_opt.c
 contrib/python-zstandard/zstd/compress/zstd_opt.h
 contrib/python-zstandard/zstd/decompress/huf_decompress.c
+contrib/python-zstandard/zstd/decompress/zstd_ddict.c
+contrib/python-zstandard/zstd/decompress/zstd_ddict.h
+contrib/python-zstandard/zstd/decompress/zstd_decompress_block.c
+contrib/python-zstandard/zstd/decompress/zstd_decompress_block.h
+contrib/python-zstandard/zstd/decompress/zstd_decompress_internal.h
 contrib/python-zstandard/zstd/decompress/zstd_decompress.c
 contrib/python-zstandard/zstd/deprecated/zbuff_common.c
 contrib/python-zstandard/zstd/deprecated/zbuff_compress.c
diff --git a/contrib/python-zstandard/MANIFEST.in b/contrib/python-zstandard/MANIFEST.in
--- a/contrib/python-zstandard/MANIFEST.in
+++ b/contrib/python-zstandard/MANIFEST.in
@@ -5,6 +5,5 @@
 include make_cffi.py
 include setup_zstd.py
 include zstd.c
-include zstd_cffi.py
 include LICENSE
 include NEWS.rst
diff --git a/contrib/python-zstandard/NEWS.rst b/contrib/python-zstandard/NEWS.rst
--- a/contrib/python-zstandard/NEWS.rst
+++ b/contrib/python-zstandard/NEWS.rst
@@ -8,8 +8,18 @@
 Actions Blocking Release
 ------------------------
 
-* compression and decompression APIs that support ``io.rawIOBase`` interface
+* compression and decompression APIs that support ``io.RawIOBase`` interface
   (#13).
+* ``stream_writer()`` APIs should support ``io.RawIOBase`` interface.
+* Properly handle non-blocking I/O and partial writes for objects implementing
+  ``io.RawIOBase``.
+* Make ``write_return_read=True`` the default for objects implementing
+  ``io.RawIOBase``.
+* Audit for consistent and proper behavior of ``flush()`` and ``close()`` for
+  all objects implementing ``io.RawIOBase``. Is calling ``close()`` on
+  wrapped stream acceptable, should ``__exit__`` always call ``close()``,
+  should ``close()`` imply ``flush()``, etc.
+* Consider making reads across frames configurable behavior.
 * Refactor module names so C and CFFI extensions live under ``zstandard``
   package.
 * Overall API design review.
@@ -43,6 +53,11 @@
 * Consider a ``chunker()`` API for decompression.
 * Consider stats for ``chunker()`` API, including finding the last consumed
   offset of input data.
+* Consider exposing ``ZSTD_cParam_getBounds()`` and
+  ``ZSTD_dParam_getBounds()`` APIs.
+* Consider controls over resetting compression contexts (session only, parameters,
+  or session and parameters).
+* Actually use the CFFI backend in fuzzing tests.
 
 Other Actions Not Blocking Release
 ---------------------------------------
@@ -51,6 +66,207 @@
 * API for ensuring max memory ceiling isn't exceeded.
 * Move off nose for testing.
 
+0.11.0 (released 2019-02-24)
+============================
+
+Backwards Compatibility Nodes
+-----------------------------
+
+* ``ZstdDecompressor.read()`` now allows reading sizes of ``-1`` or ``0``
+  and defaults to ``-1``, per the documented behavior of
+  ``io.RawIOBase.read()``. Previously, we required an argument that was
+  a positive value.
+* The ``readline()``, ``readlines()``, ``__iter__``, and ``__next__`` methods
+  of ``ZstdDecompressionReader()`` now raise ``io.UnsupportedOperation``
+  instead of ``NotImplementedError``.
+* ``ZstdDecompressor.stream_reader()`` now accepts a ``read_across_frames``
+  argument. The default value will likely be changed in a future release
+  and consumers are advised to pass the argument to avoid unwanted change
+  of behavior in the future.
+* ``setup.py`` now always disables the CFFI backend if the installed
+  CFFI package does not meet the minimum version requirements. Before, it was
+  possible for the CFFI backend to be generated and a run-time error to
+  occur.
+* In the CFFI backend, ``CompressionReader`` and ``DecompressionReader``
+  were renamed to ``ZstdCompressionReader`` and ``ZstdDecompressionReader``,
+  respectively so naming is identical to the C extension. This should have
+  no meaningful end-user impact, as instances aren't meant to be
+  constructed directly.
+* ``ZstdDecompressor.stream_writer()`` now accepts a ``write_return_read``
+  argument to control whether ``write()`` returns the number of bytes
+  read from the source / written to the decompressor. It defaults to off,
+  which preserves the existing behavior of returning the number of bytes
+  emitted from the decompressor. The default will change in a future release
+  so behavior aligns with the specified behavior of ``io.RawIOBase``.
+* ``ZstdDecompressionWriter.__exit__`` now calls ``self.close()``. This
+  will result in that stream plus the underlying stream being closed as
+  well. If this behavior is not desirable, do not use instances as
+  context managers.
+* ``ZstdCompressor.stream_writer()`` now accepts a ``write_return_read``
+  argument to control whether ``write()`` returns the number of bytes read
+  from the source / written to the compressor. It defaults to off, which
+  preserves the existing behavior of returning the number of bytes emitted
+  from the compressor. The default will change in a future release so
+  behavior aligns with the specified behavior of ``io.RawIOBase``.
+* ``ZstdCompressionWriter.__exit__`` now calls ``self.close()``. This will
+  result in that stream plus any underlying stream being closed as well. If
+  this behavior is not desirable, do not use instances as context managers.
+* ``ZstdDecompressionWriter`` no longer requires being used as a context
+  manager (#57).
+* ``ZstdCompressionWriter`` no longer requires being used as a context
+  manager (#57).
+* The ``overlap_size_log`` attribute on ``CompressionParameters`` instances
+  has been deprecated and will be removed in a future release. The
+  ``overlap_log`` attribute should be used instead.
+* The ``overlap_size_log`` argument to ``CompressionParameters`` has been
+  deprecated and will be removed in a future release. The ``overlap_log``
+  argument should be used instead.
+* The ``ldm_hash_every_log`` attribute on ``CompressionParameters`` instances
+  has been deprecated and will be removed in a future release. The
+  ``ldm_hash_rate_log`` attribute should be used instead.
+* The ``ldm_hash_every_log`` argument to ``CompressionParameters`` has been
+  deprecated and will be removed in a future release. The ``ldm_hash_rate_log``
+  argument should be used instead.
+* The ``compression_strategy`` argument to ``CompressionParameters`` has been
+  deprecated and will be removed in a future release. The ``strategy``
+  argument should be used instead.
+* The ``SEARCHLENGTH_MIN`` and ``SEARCHLENGTH_MAX`` constants are deprecated
+  and will be removed in a future release. Use ``MINMATCH_MIN`` and
+  ``MINMATCH_MAX`` instead.
+* The ``zstd_cffi`` module has been renamed to ``zstandard.cffi``. As had
+  been documented in the ``README`` file since the ``0.9.0`` release, the
+  module should not be imported directly at its new location. Instead,
+  ``import zstandard`` to cause an appropriate backend module to be loaded
+  automatically.
+
+Bug Fixes
+---------
+
+* CFFI backend could encounter a failure when sending an empty chunk into
+  ``ZstdDecompressionObj.decompress()``. The issue has been fixed.
+* CFFI backend could encounter an error when calling
+  ``ZstdDecompressionReader.read()`` if there was data remaining in an
+  internal buffer. The issue has been fixed. (#71)
+
+Changes
+-------
+
+* ``ZstDecompressionObj.decompress()`` now properly handles empty inputs in
+  the CFFI backend.
+* ``ZstdCompressionReader`` now implements ``read1()`` and ``readinto1()``.
+  These are part of the ``io.BufferedIOBase`` interface.
+* ``ZstdCompressionReader`` has gained a ``readinto(b)`` method for reading
+  compressed output into an existing buffer.
+* ``ZstdCompressionReader.read()`` now defaults to ``size=-1`` and accepts
+  read sizes of ``-1`` and ``0``. The new behavior aligns with the documented
+  behavior of ``io.RawIOBase``.
+* ``ZstdCompressionReader`` now implements ``readall()``. Previously, this
+  method raised ``NotImplementedError``.
+* ``ZstdDecompressionReader`` now implements ``read1()`` and ``readinto1()``.
+  These are part of the ``io.BufferedIOBase`` interface.
+* ``ZstdDecompressionReader.read()`` now defaults to ``size=-1`` and accepts
+  read sizes of ``-1`` and ``0``. The new behavior aligns with the documented
+  behavior of ``io.RawIOBase``.
+* ``ZstdDecompressionReader()`` now implements ``readall()``. Previously, this
+  method raised ``NotImplementedError``.
+* The ``readline()``, ``readlines()``, ``__iter__``, and ``__next__`` methods
+  of ``ZstdDecompressionReader()`` now raise ``io.UnsupportedOperation``
+  instead of ``NotImplementedError``. This reflects a decision to never
+  implement text-based I/O on (de)compressors and keep the low-level API
+  operating in the binary domain. (#13)
+* ``README.rst`` now documented how to achieve linewise iteration using
+  an ``io.TextIOWrapper`` with a ``ZstdDecompressionReader``.
+* ``ZstdDecompressionReader`` has gained a ``readinto(b)`` method for
+  reading decompressed output into an existing buffer. This allows chaining
+  to an ``io.TextIOWrapper`` on Python 3 without using an ``io.BufferedReader``.
+* ``ZstdDecompressor.stream_reader()`` now accepts a ``read_across_frames``
+  argument to control behavior when the input data has multiple zstd
+  *frames*. When ``False`` (the default for backwards compatibility), a
+  ``read()`` will stop when the end of a zstd *frame* is encountered. When
+  ``True``, ``read()`` can potentially return data spanning multiple zstd
+  *frames*. The default will likely be changed to ``True`` in a future
+  release.
+* ``setup.py`` now performs CFFI version sniffing and disables the CFFI
+  backend if CFFI is too old. Previously, we only used ``install_requires``
+  to enforce the CFFI version and not all build modes would properly enforce
+  the minimum CFFI version. (#69)
+* CFFI's ``ZstdDecompressionReader.read()`` now properly handles data
+  remaining in any internal buffer. Before, repeated ``read()`` could
+  result in *random* errors. (#71)
+* Upgraded various Python packages in CI environment.
+* Upgrade to hypothesis 4.5.11.
+* In the CFFI backend, ``CompressionReader`` and ``DecompressionReader``
+  were renamed to ``ZstdCompressionReader`` and ``ZstdDecompressionReader``,
+  respectively.
+* ``ZstdDecompressor.stream_writer()`` now accepts a ``write_return_read``
+  argument to control whether ``write()`` returns the number of bytes read
+  from the source. It defaults to ``False`` to preserve backwards
+  compatibility.
+* ``ZstdDecompressor.stream_writer()`` now implements the ``io.RawIOBase``
+  interface and behaves as a proper stream object.
+* ``ZstdCompressor.stream_writer()`` now accepts a ``write_return_read``
+  argument to control whether ``write()`` returns the number of bytes read
+  from the source. It defaults to ``False`` to preserve backwards
+  compatibility.
+* ``ZstdCompressionWriter`` now implements the ``io.RawIOBase`` interface and
+  behaves as a proper stream object. ``close()`` will now close the stream
+  and the underlying stream (if possible). ``__exit__`` will now call
+  ``close()``. Methods like ``writable()`` and ``fileno()`` are implemented.
+* ``ZstdDecompressionWriter`` no longer must be used as a context manager.
+* ``ZstdCompressionWriter`` no longer must be used as a context manager.
+  When not using as a context manager, it is important to call
+  ``flush(FRAME_FRAME)`` or the compression stream won't be properly
+  terminated and decoders may complain about malformed input.
+* ``ZstdCompressionWriter.flush()`` (what is returned from
+  ``ZstdCompressor.stream_writer()``) now accepts an argument controlling the
+  flush behavior. Its value can be one of the new constants
+  ``FLUSH_BLOCK`` or ``FLUSH_FRAME``.
+* ``ZstdDecompressionObj`` instances now have a ``flush([length=None])`` method.
+  This provides parity with standard library equivalent types. (#65)
+* ``CompressionParameters`` no longer redundantly store individual compression
+  parameters on each instance. Instead, compression parameters are stored inside
+  the underlying ``ZSTD_CCtx_params`` instance. Attributes for obtaining
+  parameters are now properties rather than instance variables.
+* Exposed the ``STRATEGY_BTULTRA2`` constant.
+* ``CompressionParameters`` instances now expose an ``overlap_log`` attribute.
+  This behaves identically to the ``overlap_size_log`` attribute.
+* ``CompressionParameters()`` now accepts an ``overlap_log`` argument that
+  behaves identically to the ``overlap_size_log`` argument. An error will be
+  raised if both arguments are specified.
+* ``CompressionParameters`` instances now expose an ``ldm_hash_rate_log``
+  attribute. This behaves identically to the ``ldm_hash_every_log`` attribute.
+* ``CompressionParameters()`` now accepts a ``ldm_hash_rate_log`` argument that
+  behaves identically to the ``ldm_hash_every_log`` argument. An error will be
+  raised if both arguments are specified.
+* ``CompressionParameters()`` now accepts a ``strategy`` argument that behaves
+  identically to the ``compression_strategy`` argument. An error will be raised
+  if both arguments are specified.
+* The ``MINMATCH_MIN`` and ``MINMATCH_MAX`` constants were added. They are
+  semantically equivalent to the old ``SEARCHLENGTH_MIN`` and
+  ``SEARCHLENGTH_MAX`` constants.
+* Bundled zstandard library upgraded from 1.3.7 to 1.3.8.
+* ``setup.py`` denotes support for Python 3.7 (Python 3.7 was supported and
+  tested in the 0.10 release).
+* ``zstd_cffi`` module has been renamed to ``zstandard.cffi``.
+* ``ZstdCompressor.stream_writer()`` now reuses a buffer in order to avoid
+  allocating a new buffer for every operation. This should result in faster
+  performance in cases where ``write()`` or ``flush()`` are being called
+  frequently. (#62)
+* Bundled zstandard library upgraded from 1.3.6 to 1.3.7.
+
+0.10.2 (released 2018-11-03)
+============================
+
+Bug Fixes
+---------
+
+* ``zstd_cffi.py`` added to ``setup.py`` (#60).
+
+Changes
+-------
+
+* Change some integer casts to avoid ``ssize_t`` (#61).
+
 0.10.1 (released 2018-10-08)
 ============================
 
diff --git a/contrib/python-zstandard/README.rst b/contrib/python-zstandard/README.rst
--- a/contrib/python-zstandard/README.rst
+++ b/contrib/python-zstandard/README.rst
@@ -20,9 +20,9 @@
 Requirements
 ============
 
-This extension is designed to run with Python 2.7, 3.4, 3.5, and 3.6
-on common platforms (Linux, Windows, and OS X). x86 and x86_64 are well-tested
-on Windows. Only x86_64 is well-tested on Linux and macOS.
+This extension is designed to run with Python 2.7, 3.4, 3.5, 3.6, and 3.7
+on common platforms (Linux, Windows, and OS X). On PyPy (both PyPy2 and PyPy3) we support version 6.0.0 and above. 
+x86 and x86_64 are well-tested on Windows. Only x86_64 is well-tested on Linux and macOS.
 
 Installing
 ==========
@@ -215,7 +215,7 @@
 
                # Do something with compressed chunk.
 
-When the context manager exists or ``close()`` is called, the stream is closed,
+When the context manager exits or ``close()`` is called, the stream is closed,
 underlying resources are released, and future operations against the compression
 stream will fail.
 
@@ -251,8 +251,54 @@
 Streaming Input API
 ^^^^^^^^^^^^^^^^^^^
 
-``stream_writer(fh)`` (which behaves as a context manager) allows you to *stream*
-data into a compressor.::
+``stream_writer(fh)`` allows you to *stream* data into a compressor.
+
+Returned instances implement the ``io.RawIOBase`` interface. Only methods
+that involve writing will do useful things.
+
+The argument to ``stream_writer()`` must have a ``write(data)`` method. As
+compressed data is available, ``write()`` will be called with the compressed
+data as its argument. Many common Python types implement ``write()``, including
+open file handles and ``io.BytesIO``.
+
+The ``write(data)`` method is used to feed data into the compressor.
+
+The ``flush([flush_mode=FLUSH_BLOCK])`` method can be called to evict whatever
+data remains within the compressor's internal state into the output object. This
+may result in 0 or more ``write()`` calls to the output object. This method
+accepts an optional ``flush_mode`` argument to control the flushing behavior.
+Its value can be any of the ``FLUSH_*`` constants.
+
+Both ``write()`` and ``flush()`` return the number of bytes written to the
+object's ``write()``. In many cases, small inputs do not accumulate enough
+data to cause a write and ``write()`` will return ``0``.
+
+Calling ``close()`` will mark the stream as closed and subsequent I/O
+operations will raise ``ValueError`` (per the documented behavior of
+``io.RawIOBase``). ``close()`` will also call ``close()`` on the underlying
+stream if such a method exists.
+
+Typically usage is as follows::
+
+   cctx = zstd.ZstdCompressor(level=10)
+   compressor = cctx.stream_writer(fh)
+
+   compressor.write(b'chunk 0\n')
+   compressor.write(b'chunk 1\n')
+   compressor.flush()
+   # Receiver will be able to decode ``chunk 0\nchunk 1\n`` at this point.
+   # Receiver is also expecting more data in the zstd *frame*.
+
+   compressor.write(b'chunk 2\n')
+   compressor.flush(zstd.FLUSH_FRAME)
+   # Receiver will be able to decode ``chunk 0\nchunk 1\nchunk 2``.
+   # Receiver is expecting no more data, as the zstd frame is closed.
+   # Any future calls to ``write()`` at this point will construct a new
+   # zstd frame.
+
+Instances can be used as context managers. Exiting the context manager is
+the equivalent of calling ``close()``, which is equivalent to calling
+``flush(zstd.FLUSH_FRAME)``::
 
    cctx = zstd.ZstdCompressor(level=10)
    with cctx.stream_writer(fh) as compressor:
@@ -260,22 +306,12 @@
        compressor.write(b'chunk 1')
        ...
 
-The argument to ``stream_writer()`` must have a ``write(data)`` method. As
-compressed data is available, ``write()`` will be called with the compressed
-data as its argument. Many common Python types implement ``write()``, including
-open file handles and ``io.BytesIO``.
+.. important::
 
-``stream_writer()`` returns an object representing a streaming compressor
-instance. It **must** be used as a context manager. That object's
-``write(data)`` method is used to feed data into the compressor.
-
-A ``flush()`` method can be called to evict whatever data remains within the
-compressor's internal state into the output object. This may result in 0 or
-more ``write()`` calls to the output object.
-
-Both ``write()`` and ``flush()`` return the number of bytes written to the
-object's ``write()``. In many cases, small inputs do not accumulate enough
-data to cause a write and ``write()`` will return ``0``.
+   If ``flush(FLUSH_FRAME)`` is not called, emitted data doesn't constitute
+   a full zstd *frame* and consumers of this data may complain about malformed
+   input. It is recommended to use instances as a context manager to ensure
+   *frames* are properly finished.
 
 If the size of the data being fed to this streaming compressor is known,
 you can declare it before compression begins::
@@ -310,6 +346,14 @@
         ...
         total_written = compressor.tell()
 
+``stream_writer()`` accepts a ``write_return_read`` boolean argument to control
+the return value of ``write()``. When ``False`` (the default), ``write()`` returns
+the number of bytes that were ``write()``en to the underlying object. When
+``True``, ``write()`` returns the number of bytes read from the input that
+were subsequently written to the compressor. ``True`` is the *proper* behavior
+for ``write()`` as specified by the ``io.RawIOBase`` interface and will become
+the default value in a future release.
+
 Streaming Output API
 ^^^^^^^^^^^^^^^^^^^^
 
@@ -654,27 +698,63 @@
 ``tell()`` returns the number of decompressed bytes read so far.
 
 Not all I/O methods are implemented. Notably missing is support for
-``readline()``, ``readlines()``, and linewise iteration support. Support for
-these is planned for a future release.
+``readline()``, ``readlines()``, and linewise iteration support. This is
+because streams operate on binary data - not text data. If you want to
+convert decompressed output to text, you can chain an ``io.TextIOWrapper``
+to the stream::
+
+   with open(path, 'rb') as fh:
+       dctx = zstd.ZstdDecompressor()
+       stream_reader = dctx.stream_reader(fh)
+       text_stream = io.TextIOWrapper(stream_reader, encoding='utf-8')
+
+       for line in text_stream:
+           ...
+
+The ``read_across_frames`` argument to ``stream_reader()`` controls the
+behavior of read operations when the end of a zstd *frame* is encountered.
+When ``False`` (the default), a read will complete when the end of a
+zstd *frame* is encountered. When ``True``, a read can potentially
+return data spanning multiple zstd *frames*.
 
 Streaming Input API
 ^^^^^^^^^^^^^^^^^^^
 
-``stream_writer(fh)`` can be used to incrementally send compressed data to a
-decompressor.::
+``stream_writer(fh)`` allows you to *stream* data into a decompressor.
+
+Returned instances implement the ``io.RawIOBase`` interface. Only methods
+that involve writing will do useful things.
+
+The argument to ``stream_writer()`` is typically an object that also implements
+``io.RawIOBase``. But any object with a ``write(data)`` method will work. Many
+common Python types conform to this interface, including open file handles
+and ``io.BytesIO``.
+
+Behavior is similar to ``ZstdCompressor.stream_writer()``: compressed data
+is sent to the decompressor by calling ``write(data)`` and decompressed
+output is written to the underlying stream by calling its ``write(data)``
+method.::
 
     dctx = zstd.ZstdDecompressor()
-    with dctx.stream_writer(fh) as decompressor:
-        decompressor.write(compressed_data)
+    decompressor = dctx.stream_writer(fh)
 
-This behaves similarly to ``zstd.ZstdCompressor``: compressed data is written to
-the decompressor by calling ``write(data)`` and decompressed output is written
-to the output object by calling its ``write(data)`` method.
+    decompressor.write(compressed_data)
+    ...
+
 
 Calls to ``write()`` will return the number of bytes written to the output
 object. Not all inputs will result in bytes being written, so return values
 of ``0`` are possible.
 
+Like the ``stream_writer()`` compressor, instances can be used as context
+managers. However, context managers add no extra special behavior and offer
+little to no benefit to being used.
+
+Calling ``close()`` will mark the stream as closed and subsequent I/O operations
+will raise ``ValueError`` (per the documented behavior of ``io.RawIOBase``).
+``close()`` will also call ``close()`` on the underlying stream if such a
+method exists.
+
 The size of chunks being ``write()`` to the destination can be specified::
 
     dctx = zstd.ZstdDecompressor()
@@ -687,6 +767,13 @@
     with dctx.stream_writer(fh) as decompressor:
         byte_size = decompressor.memory_size()
 
+``stream_writer()`` accepts a ``write_return_read`` boolean argument to control
+the return value of ``write()``. When ``False`` (the default)``, ``write()``
+returns the number of bytes that were ``write()``en to the underlying stream.
+When ``True``, ``write()`` returns the number of bytes read from the input.
+``True`` is the *proper* behavior for ``write()`` as specified by the
+``io.RawIOBase`` interface and will become the default in a future release.
+
 Streaming Output API
 ^^^^^^^^^^^^^^^^^^^^
 
@@ -791,6 +878,10 @@
    memory (re)allocations, this streaming decompression API isn't as
    efficient as other APIs.
 
+For compatibility with the standard library APIs, instances expose a
+``flush([length=None])`` method. This method no-ops and has no meaningful
+side-effects, making it safe to call any time.
+
 Batch Decompression API
 ^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -1147,18 +1238,21 @@
 * search_log
 * min_match
 * target_length
-* compression_strategy
+* strategy
+* compression_strategy (deprecated: same as ``strategy``)
 * write_content_size
 * write_checksum
 * write_dict_id
 * job_size
-* overlap_size_log
+* overlap_log
+* overlap_size_log (deprecated: same as ``overlap_log``)
 * force_max_window
 * enable_ldm
 * ldm_hash_log
 * ldm_min_match
 * ldm_bucket_size_log
-* ldm_hash_every_log
+* ldm_hash_rate_log
+* ldm_hash_every_log (deprecated: same as ``ldm_hash_rate_log``)
 * threads
 
 Some of these are very low-level settings. It may help to consult the official
@@ -1240,6 +1334,13 @@
 MAGIC_NUMBER
     Frame header as an integer
 
+FLUSH_BLOCK
+    Flushing behavior that denotes to flush a zstd block. A decompressor will
+    be able to decode all data fed into the compressor so far.
+FLUSH_FRAME
+    Flushing behavior that denotes to end a zstd frame. Any new data fed
+    to the compressor will start a new frame.
+
 CONTENTSIZE_UNKNOWN
     Value for content size when the content size is unknown.
 CONTENTSIZE_ERROR
@@ -1261,10 +1362,18 @@
     Minimum value for compression parameter
 SEARCHLOG_MAX
     Maximum value for compression parameter
+MINMATCH_MIN
+    Minimum value for compression parameter
+MINMATCH_MAX
+    Maximum value for compression parameter
 SEARCHLENGTH_MIN
     Minimum value for compression parameter
+
+    Deprecated: use ``MINMATCH_MIN``
 SEARCHLENGTH_MAX
     Maximum value for compression parameter
+
+    Deprecated: use ``MINMATCH_MAX``
 TARGETLENGTH_MIN
     Minimum value for compression parameter
 STRATEGY_FAST
@@ -1283,6 +1392,8 @@
     Compression strategy
 STRATEGY_BTULTRA
     Compression strategy
+STRATEGY_BTULTRA2
+    Compression strategy
 
 FORMAT_ZSTD1
     Zstandard frame format
diff --git a/contrib/python-zstandard/c-ext/compressionchunker.c b/contrib/python-zstandard/c-ext/compressionchunker.c
--- a/contrib/python-zstandard/c-ext/compressionchunker.c
+++ b/contrib/python-zstandard/c-ext/compressionchunker.c
@@ -43,7 +43,7 @@
 	/* If we have data left in the input, consume it. */
 	while (chunker->input.pos < chunker->input.size) {
 		Py_BEGIN_ALLOW_THREADS
-		zresult = ZSTD_compress_generic(chunker->compressor->cctx, &chunker->output,
+		zresult = ZSTD_compressStream2(chunker->compressor->cctx, &chunker->output,
 			&chunker->input, ZSTD_e_continue);
 		Py_END_ALLOW_THREADS
 
@@ -104,7 +104,7 @@
 	}
 
 	Py_BEGIN_ALLOW_THREADS
-	zresult = ZSTD_compress_generic(chunker->compressor->cctx, &chunker->output,
+	zresult = ZSTD_compressStream2(chunker->compressor->cctx, &chunker->output,
 		&chunker->input, zFlushMode);
 	Py_END_ALLOW_THREADS
 
diff --git a/contrib/python-zstandard/c-ext/compressiondict.c b/contrib/python-zstandard/c-ext/compressiondict.c
--- a/contrib/python-zstandard/c-ext/compressiondict.c
+++ b/contrib/python-zstandard/c-ext/compressiondict.c
@@ -298,13 +298,9 @@
 		cParams = ZSTD_getCParams(level, 0, self->dictSize);
 	}
 	else {
-		cParams.chainLog = compressionParams->chainLog;
-		cParams.hashLog = compressionParams->hashLog;
-		cParams.searchLength = compressionParams->minMatch;
-		cParams.searchLog = compressionParams->searchLog;
-		cParams.strategy = compressionParams->compressionStrategy;
-		cParams.targetLength = compressionParams->targetLength;
-		cParams.windowLog = compressionParams->windowLog;
+		if (to_cparams(compressionParams, &cParams)) {
+			return NULL;
+		}
 	}
 
 	assert(!self->cdict);
diff --git a/contrib/python-zstandard/c-ext/compressionparams.c b/contrib/python-zstandard/c-ext/compressionparams.c
--- a/contrib/python-zstandard/c-ext/compressionparams.c
+++ b/contrib/python-zstandard/c-ext/compressionparams.c
@@ -10,7 +10,7 @@
 
 extern PyObject* ZstdError;
 
-int set_parameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, unsigned value) {
+int set_parameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, int value) {
 	size_t zresult = ZSTD_CCtxParam_setParameter(params, param, value);
 	if (ZSTD_isError(zresult)) {
 		PyErr_Format(ZstdError, "unable to set compression context parameter: %s",
@@ -23,28 +23,41 @@
 
 #define TRY_SET_PARAMETER(params, param, value) if (set_parameter(params, param, value)) return -1;
 
+#define TRY_COPY_PARAMETER(source, dest, param) { \
+	int result; \
+	size_t zresult = ZSTD_CCtxParam_getParameter(source, param, &result); \
+	if (ZSTD_isError(zresult)) { \
+		return 1; \
+	} \
+	zresult = ZSTD_CCtxParam_setParameter(dest, param, result); \
+	if (ZSTD_isError(zresult)) { \
+		return 1; \
+	} \
+}
+
 int set_parameters(ZSTD_CCtx_params* params, ZstdCompressionParametersObject* obj) {
-	TRY_SET_PARAMETER(params, ZSTD_p_format, obj->format);
-	TRY_SET_PARAMETER(params, ZSTD_p_compressionLevel, (unsigned)obj->compressionLevel);
-	TRY_SET_PARAMETER(params, ZSTD_p_windowLog, obj->windowLog);
-	TRY_SET_PARAMETER(params, ZSTD_p_hashLog, obj->hashLog);
-	TRY_SET_PARAMETER(params, ZSTD_p_chainLog, obj->chainLog);
-	TRY_SET_PARAMETER(params, ZSTD_p_searchLog, obj->searchLog);
-	TRY_SET_PARAMETER(params, ZSTD_p_minMatch, obj->minMatch);
-	TRY_SET_PARAMETER(params, ZSTD_p_targetLength, obj->targetLength);
-	TRY_SET_PARAMETER(params, ZSTD_p_compressionStrategy, obj->compressionStrategy);
-	TRY_SET_PARAMETER(params, ZSTD_p_contentSizeFlag, obj->contentSizeFlag);
-	TRY_SET_PARAMETER(params, ZSTD_p_checksumFlag, obj->checksumFlag);
-	TRY_SET_PARAMETER(params, ZSTD_p_dictIDFlag, obj->dictIDFlag);
-	TRY_SET_PARAMETER(params, ZSTD_p_nbWorkers, obj->threads);
-	TRY_SET_PARAMETER(params, ZSTD_p_jobSize, obj->jobSize);
-	TRY_SET_PARAMETER(params, ZSTD_p_overlapSizeLog, obj->overlapSizeLog);
-	TRY_SET_PARAMETER(params, ZSTD_p_forceMaxWindow, obj->forceMaxWindow);
-	TRY_SET_PARAMETER(params, ZSTD_p_enableLongDistanceMatching, obj->enableLongDistanceMatching);
-	TRY_SET_PARAMETER(params, ZSTD_p_ldmHashLog, obj->ldmHashLog);
-	TRY_SET_PARAMETER(params, ZSTD_p_ldmMinMatch, obj->ldmMinMatch);
-	TRY_SET_PARAMETER(params, ZSTD_p_ldmBucketSizeLog, obj->ldmBucketSizeLog);
-	TRY_SET_PARAMETER(params, ZSTD_p_ldmHashEveryLog, obj->ldmHashEveryLog);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_nbWorkers);
+
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_format);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_compressionLevel);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_windowLog);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_hashLog);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_chainLog);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_searchLog);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_minMatch);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_targetLength);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_strategy);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_contentSizeFlag);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_checksumFlag);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_dictIDFlag);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_jobSize);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_overlapLog);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_forceMaxWindow);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_enableLongDistanceMatching);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_ldmHashLog);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_ldmMinMatch);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_ldmBucketSizeLog);
+	TRY_COPY_PARAMETER(obj->params, params, ZSTD_c_ldmHashRateLog);
 
 	return 0;
 }
@@ -64,6 +77,41 @@
 	return set_parameters(params->params, params);
 }
 
+#define TRY_GET_PARAMETER(params, param, value) { \
+    size_t zresult = ZSTD_CCtxParam_getParameter(params, param, value); \
+    if (ZSTD_isError(zresult)) { \
+        PyErr_Format(ZstdError, "unable to retrieve parameter: %s", ZSTD_getErrorName(zresult)); \
+        return 1; \
+    } \
+}
+
+int to_cparams(ZstdCompressionParametersObject* params, ZSTD_compressionParameters* cparams) {
+	int value;
+
+	TRY_GET_PARAMETER(params->params, ZSTD_c_windowLog, &value);
+	cparams->windowLog = value;
+
+	TRY_GET_PARAMETER(params->params, ZSTD_c_chainLog, &value);
+	cparams->chainLog = value;
+
+	TRY_GET_PARAMETER(params->params, ZSTD_c_hashLog, &value);
+	cparams->hashLog = value;
+
+	TRY_GET_PARAMETER(params->params, ZSTD_c_searchLog, &value);
+	cparams->searchLog = value;
+
+	TRY_GET_PARAMETER(params->params, ZSTD_c_minMatch, &value);
+	cparams->minMatch = value;
+
+	TRY_GET_PARAMETER(params->params, ZSTD_c_targetLength, &value);
+	cparams->targetLength = value;
+
+	TRY_GET_PARAMETER(params->params, ZSTD_c_strategy, &value);
+	cparams->strategy = value;
+
+	return 0;
+}
+
 static int ZstdCompressionParameters_init(ZstdCompressionParametersObject* self, PyObject* args, PyObject* kwargs) {
 	static char* kwlist[] = {
 		"format",
@@ -75,50 +123,60 @@
 		"min_match",
 		"target_length",
 		"compression_strategy",
+		"strategy",
 		"write_content_size",
 		"write_checksum",
 		"write_dict_id",
 		"job_size",
+		"overlap_log",
 		"overlap_size_log",
 		"force_max_window",
 		"enable_ldm",
 		"ldm_hash_log",
 		"ldm_min_match",
 		"ldm_bucket_size_log",
+		"ldm_hash_rate_log",
 		"ldm_hash_every_log",
 		"threads",
 		NULL
 	};
 
-	unsigned format = 0;
+	int format = 0;
 	int compressionLevel = 0;
-	unsigned windowLog = 0;
-	unsigned hashLog = 0;
-	unsigned chainLog = 0;
-	unsigned searchLog = 0;
-	unsigned minMatch = 0;
-	unsigned targetLength = 0;
-	unsigned compressionStrategy = 0;
-	unsigned contentSizeFlag = 1;
-	unsigned checksumFlag = 0;
-	unsigned dictIDFlag = 0;
-	unsigned jobSize = 0;
-	unsigned overlapSizeLog = 0;
-	unsigned forceMaxWindow = 0;
-	unsigned enableLDM = 0;
-	unsigned ldmHashLog = 0;
-	unsigned ldmMinMatch = 0;
-	unsigned ldmBucketSizeLog = 0;
-	unsigned ldmHashEveryLog = 0;
+	int windowLog = 0;
+	int hashLog = 0;
+	int chainLog = 0;
+	int searchLog = 0;
+	int minMatch = 0;
+	int targetLength = 0;
+	int compressionStrategy = -1;
+	int strategy = -1;
+	int contentSizeFlag = 1;
+	int checksumFlag = 0;
+	int dictIDFlag = 0;
+	int jobSize = 0;
+	int overlapLog = -1;
+	int overlapSizeLog = -1;
+	int forceMaxWindow = 0;
+	int enableLDM = 0;
+	int ldmHashLog = 0;
+	int ldmMinMatch = 0;
+	int ldmBucketSizeLog = 0;
+	int ldmHashRateLog = -1;
+	int ldmHashEveryLog = -1;
 	int threads = 0;
 
 	if (!PyArg_ParseTupleAndKeywords(args, kwargs,
-		"|IiIIIIIIIIIIIIIIIIIIi:CompressionParameters",
+		"|iiiiiiiiiiiiiiiiiiiiiiii:CompressionParameters",
 		kwlist, &format, &compressionLevel, &windowLog, &hashLog, &chainLog,
-		&searchLog, &minMatch, &targetLength, &compressionStrategy,
-		&contentSizeFlag, &checksumFlag, &dictIDFlag, &jobSize, &overlapSizeLog,
-		&forceMaxWindow, &enableLDM, &ldmHashLog, &ldmMinMatch, &ldmBucketSizeLog,
-		&ldmHashEveryLog, &threads)) {
+		&searchLog, &minMatch, &targetLength, &compressionStrategy, &strategy,
+		&contentSizeFlag, &checksumFlag, &dictIDFlag, &jobSize, &overlapLog,
+		&overlapSizeLog, &forceMaxWindow, &enableLDM, &ldmHashLog, &ldmMinMatch,
+		&ldmBucketSizeLog, &ldmHashRateLog, &ldmHashEveryLog, &threads)) {
+		return -1;
+	}
+
+	if (reset_params(self)) {
 		return -1;
 	}
 
@@ -126,32 +184,70 @@
 		threads = cpu_count();
 	}
 
-	self->format = format;
-	self->compressionLevel = compressionLevel;
-	self->windowLog = windowLog;
-	self->hashLog = hashLog;
-	self->chainLog = chainLog;
-	self->searchLog = searchLog;
-	self->minMatch = minMatch;
-	self->targetLength = targetLength;
-	self->compressionStrategy = compressionStrategy;
-	self->contentSizeFlag = contentSizeFlag;
-	self->checksumFlag = checksumFlag;
-	self->dictIDFlag = dictIDFlag;
-	self->threads = threads;
-	self->jobSize = jobSize;
-	self->overlapSizeLog = overlapSizeLog;
-	self->forceMaxWindow = forceMaxWindow;
-	self->enableLongDistanceMatching = enableLDM;
-	self->ldmHashLog = ldmHashLog;
-	self->ldmMinMatch = ldmMinMatch;
-	self->ldmBucketSizeLog = ldmBucketSizeLog;
-	self->ldmHashEveryLog = ldmHashEveryLog;
+	/* We need to set ZSTD_c_nbWorkers before ZSTD_c_jobSize and ZSTD_c_overlapLog
+	 * because setting ZSTD_c_nbWorkers resets the other parameters. */
+	TRY_SET_PARAMETER(self->params, ZSTD_c_nbWorkers, threads);
+
+	TRY_SET_PARAMETER(self->params, ZSTD_c_format, format);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_compressionLevel, compressionLevel);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_windowLog, windowLog);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_hashLog, hashLog);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_chainLog, chainLog);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_searchLog, searchLog);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_minMatch, minMatch);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_targetLength, targetLength);
 
-	if (reset_params(self)) {
+	if (compressionStrategy != -1 && strategy != -1) {
+		PyErr_SetString(PyExc_ValueError, "cannot specify both compression_strategy and strategy");
+		return -1;
+    }
+
+	if (compressionStrategy != -1) {
+		strategy = compressionStrategy;
+	}
+	else if (strategy == -1) {
+		strategy = 0;
+	}
+
+	TRY_SET_PARAMETER(self->params, ZSTD_c_strategy, strategy);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_contentSizeFlag, contentSizeFlag);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_checksumFlag, checksumFlag);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_dictIDFlag, dictIDFlag);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_jobSize, jobSize);
+
+	if (overlapLog != -1 && overlapSizeLog != -1) {
+		PyErr_SetString(PyExc_ValueError, "cannot specify both overlap_log and overlap_size_log");
 		return -1;
 	}
 
+	if (overlapSizeLog != -1) {
+		overlapLog = overlapSizeLog;
+	}
+	else if (overlapLog == -1) {
+		overlapLog = 0;
+	}
+
+	TRY_SET_PARAMETER(self->params, ZSTD_c_overlapLog, overlapLog);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_forceMaxWindow, forceMaxWindow);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_enableLongDistanceMatching, enableLDM);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_ldmHashLog, ldmHashLog);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_ldmMinMatch, ldmMinMatch);
+	TRY_SET_PARAMETER(self->params, ZSTD_c_ldmBucketSizeLog, ldmBucketSizeLog);
+
+	if (ldmHashRateLog != -1 && ldmHashEveryLog != -1) {
+		PyErr_SetString(PyExc_ValueError, "cannot specify both ldm_hash_rate_log and ldm_hash_everyLog");
+		return -1;
+	}
+
+	if (ldmHashEveryLog != -1) {
+		ldmHashRateLog = ldmHashEveryLog;
+	}
+	else if (ldmHashRateLog == -1) {
+		ldmHashRateLog = 0;
+	}
+
+	TRY_SET_PARAMETER(self->params, ZSTD_c_ldmHashRateLog, ldmHashRateLog);
+
 	return 0;
 }
 
@@ -259,7 +355,7 @@
 
 	val = PyDict_GetItemString(kwargs, "min_match");
 	if (!val) {
-		val = PyLong_FromUnsignedLong(params.searchLength);
+		val = PyLong_FromUnsignedLong(params.minMatch);
 		if (!val) {
 			goto cleanup;
 		}
@@ -336,6 +432,41 @@
 	PyObject_Del(self);
 }
 
+#define PARAM_GETTER(name, param) PyObject* ZstdCompressionParameters_get_##name(PyObject* self, void* unused) { \
+    int result; \
+    size_t zresult; \
+    ZstdCompressionParametersObject* p = (ZstdCompressionParametersObject*)(self); \
+    zresult = ZSTD_CCtxParam_getParameter(p->params, param, &result); \
+    if (ZSTD_isError(zresult)) { \
+        PyErr_Format(ZstdError, "unable to get compression parameter: %s", \
+            ZSTD_getErrorName(zresult)); \
+        return NULL; \
+    } \
+    return PyLong_FromLong(result); \
+}
+
+PARAM_GETTER(format, ZSTD_c_format)
+PARAM_GETTER(compression_level, ZSTD_c_compressionLevel)
+PARAM_GETTER(window_log, ZSTD_c_windowLog)
+PARAM_GETTER(hash_log, ZSTD_c_hashLog)
+PARAM_GETTER(chain_log, ZSTD_c_chainLog)
+PARAM_GETTER(search_log, ZSTD_c_searchLog)
+PARAM_GETTER(min_match, ZSTD_c_minMatch)
+PARAM_GETTER(target_length, ZSTD_c_targetLength)
+PARAM_GETTER(compression_strategy, ZSTD_c_strategy)
+PARAM_GETTER(write_content_size, ZSTD_c_contentSizeFlag)
+PARAM_GETTER(write_checksum, ZSTD_c_checksumFlag)
+PARAM_GETTER(write_dict_id, ZSTD_c_dictIDFlag)
+PARAM_GETTER(job_size, ZSTD_c_jobSize)
+PARAM_GETTER(overlap_log, ZSTD_c_overlapLog)
+PARAM_GETTER(force_max_window, ZSTD_c_forceMaxWindow)
+PARAM_GETTER(enable_ldm, ZSTD_c_enableLongDistanceMatching)
+PARAM_GETTER(ldm_hash_log, ZSTD_c_ldmHashLog)
+PARAM_GETTER(ldm_min_match, ZSTD_c_ldmMinMatch)
+PARAM_GETTER(ldm_bucket_size_log, ZSTD_c_ldmBucketSizeLog)
+PARAM_GETTER(ldm_hash_rate_log, ZSTD_c_ldmHashRateLog)
+PARAM_GETTER(threads, ZSTD_c_nbWorkers)
+
 static PyMethodDef ZstdCompressionParameters_methods[] = {
 	{
 		"from_level",
@@ -352,70 +483,34 @@
 	{ NULL, NULL }
 };
 
-static PyMemberDef ZstdCompressionParameters_members[] = {
-	{ "format", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, format), READONLY,
-	  "compression format" },
-	{ "compression_level", T_INT,
-	  offsetof(ZstdCompressionParametersObject, compressionLevel), READONLY,
-	  "compression level" },
-	{ "window_log", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, windowLog), READONLY,
-	  "window log" },
-	{ "hash_log", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, hashLog), READONLY,
-	  "hash log" },
-	{ "chain_log", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, chainLog), READONLY,
-	  "chain log" },
-	{ "search_log", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, searchLog), READONLY,
-	  "search log" },
-	{ "min_match", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, minMatch), READONLY,
-	  "search length" },
-	{ "target_length", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, targetLength), READONLY,
-	  "target length" },
-	{ "compression_strategy", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, compressionStrategy), READONLY,
-	  "compression strategy" },
-	{ "write_content_size", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, contentSizeFlag), READONLY,
-	  "whether to write content size in frames" },
-	{ "write_checksum", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, checksumFlag), READONLY,
-	  "whether to write checksum in frames" },
-	{ "write_dict_id", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, dictIDFlag), READONLY,
-	  "whether to write dictionary ID in frames" },
-	{ "threads", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, threads), READONLY,
-	  "number of threads to use" },
-	{ "job_size", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, jobSize), READONLY,
-	  "size of compression job when using multiple threads" },
-	{ "overlap_size_log", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, overlapSizeLog), READONLY,
-	  "Size of previous input reloaded at the beginning of each job" },
-	{ "force_max_window", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, forceMaxWindow), READONLY,
-	  "force back references to remain smaller than window size" },
-	{ "enable_ldm", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, enableLongDistanceMatching), READONLY,
-	  "whether to enable long distance matching" },
-	{ "ldm_hash_log", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, ldmHashLog), READONLY,
-	  "Size of the table for long distance matching, as a power of 2" },
-	{ "ldm_min_match", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, ldmMinMatch), READONLY,
-	  "minimum size of searched matches for long distance matcher" },
-	{ "ldm_bucket_size_log", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, ldmBucketSizeLog), READONLY,
-	  "log size of each bucket in the LDM hash table for collision resolution" },
-	{ "ldm_hash_every_log", T_UINT,
-	  offsetof(ZstdCompressionParametersObject, ldmHashEveryLog), READONLY,
-	  "frequency of inserting/looking up entries in the LDM hash table" },
+#define GET_SET_ENTRY(name) { #name, ZstdCompressionParameters_get_##name, NULL, NULL, NULL }
+
+static PyGetSetDef ZstdCompressionParameters_getset[] = {
+	GET_SET_ENTRY(format),
+	GET_SET_ENTRY(compression_level),
+	GET_SET_ENTRY(window_log),
+	GET_SET_ENTRY(hash_log),
+	GET_SET_ENTRY(chain_log),
+	GET_SET_ENTRY(search_log),
+	GET_SET_ENTRY(min_match),
+	GET_SET_ENTRY(target_length),
+	GET_SET_ENTRY(compression_strategy),
+	GET_SET_ENTRY(write_content_size),
+	GET_SET_ENTRY(write_checksum),
+	GET_SET_ENTRY(write_dict_id),
+	GET_SET_ENTRY(threads),
+	GET_SET_ENTRY(job_size),
+	GET_SET_ENTRY(overlap_log),
+	/* TODO remove this deprecated attribute */
+	{ "overlap_size_log", ZstdCompressionParameters_get_overlap_log, NULL, NULL, NULL },
+	GET_SET_ENTRY(force_max_window),
+	GET_SET_ENTRY(enable_ldm),
+	GET_SET_ENTRY(ldm_hash_log),
+	GET_SET_ENTRY(ldm_min_match),
+	GET_SET_ENTRY(ldm_bucket_size_log),
+	GET_SET_ENTRY(ldm_hash_rate_log),
+	/* TODO remove this deprecated attribute */
+	{ "ldm_hash_every_log", ZstdCompressionParameters_get_ldm_hash_rate_log, NULL, NULL, NULL },
 	{ NULL }
 };
 
@@ -448,8 +543,8 @@
 	0,                         /* tp_iter */
 	0,                         /* tp_iternext */
 	ZstdCompressionParameters_methods, /* tp_methods */
-	ZstdCompressionParameters_members, /* tp_members */
-	0,                         /* tp_getset */
+	0,                          /* tp_members */
+	ZstdCompressionParameters_getset,  /* tp_getset */
 	0,                         /* tp_base */
 	0,                         /* tp_dict */
 	0,                         /* tp_descr_get */
diff --git a/contrib/python-zstandard/c-ext/compressionreader.c b/contrib/python-zstandard/c-ext/compressionreader.c
--- a/contrib/python-zstandard/c-ext/compressionreader.c
+++ b/contrib/python-zstandard/c-ext/compressionreader.c
@@ -128,6 +128,96 @@
 	return PyLong_FromUnsignedLongLong(self->bytesCompressed);
 }
 
+int read_compressor_input(ZstdCompressionReader* self) {
+	if (self->finishedInput) {
+		return 0;
+	}
+
+	if (self->input.pos != self->input.size) {
+		return 0;
+	}
+
+	if (self->reader) {
+		Py_buffer buffer;
+
+		assert(self->readResult == NULL);
+
+		self->readResult = PyObject_CallMethod(self->reader, "read",
+		    "k", self->readSize);
+
+		if (NULL == self->readResult) {
+			return -1;
+		}
+
+		memset(&buffer, 0, sizeof(buffer));
+
+		if (0 != PyObject_GetBuffer(self->readResult, &buffer, PyBUF_CONTIG_RO)) {
+			return -1;
+		}
+
+		/* EOF */
+		if (0 == buffer.len) {
+			self->finishedInput = 1;
+			Py_CLEAR(self->readResult);
+		}
+		else {
+			self->input.src = buffer.buf;
+			self->input.size = buffer.len;
+			self->input.pos = 0;
+		}
+
+		PyBuffer_Release(&buffer);
+	}
+	else {
+		assert(self->buffer.buf);
+
+		self->input.src = self->buffer.buf;
+		self->input.size = self->buffer.len;
+		self->input.pos = 0;
+	}
+
+	return 1;
+}
+
+int compress_input(ZstdCompressionReader* self, ZSTD_outBuffer* output) {
+	size_t oldPos;
+	size_t zresult;
+
+	/* If we have data left over, consume it. */
+	if (self->input.pos < self->input.size) {
+		oldPos = output->pos;
+
+		Py_BEGIN_ALLOW_THREADS
+		zresult = ZSTD_compressStream2(self->compressor->cctx,
+		    output, &self->input, ZSTD_e_continue);
+		Py_END_ALLOW_THREADS
+
+		self->bytesCompressed += output->pos - oldPos;
+
+		/* Input exhausted. Clear out state tracking. */
+		if (self->input.pos == self->input.size) {
+			memset(&self->input, 0, sizeof(self->input));
+			Py_CLEAR(self->readResult);
+
+			if (self->buffer.buf) {
+				self->finishedInput = 1;
+			}
+		}
+
+		if (ZSTD_isError(zresult)) {
+			PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult));
+			return -1;
+		}
+	}
+
+    if (output->pos && output->pos == output->size) {
+        return 1;
+    }
+    else {
+        return 0;
+    }
+}
+
 static PyObject* reader_read(ZstdCompressionReader* self, PyObject* args, PyObject* kwargs) {
 	static char* kwlist[] = {
 		"size",
@@ -140,25 +230,30 @@
 	Py_ssize_t resultSize;
 	size_t zresult;
 	size_t oldPos;
+	int readResult, compressResult;
 
 	if (self->closed) {
 		PyErr_SetString(PyExc_ValueError, "stream is closed");
 		return NULL;
 	}
 
-	if (self->finishedOutput) {
-		return PyBytes_FromStringAndSize("", 0);
-	}
-
-	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "n", kwlist, &size)) {
+	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n", kwlist, &size)) {
 		return NULL;
 	}
 
-	if (size < 1) {
-		PyErr_SetString(PyExc_ValueError, "cannot read negative or size 0 amounts");
+	if (size < -1) {
+		PyErr_SetString(PyExc_ValueError, "cannot read negative amounts less than -1");
 		return NULL;
 	}
 
+	if (size == -1) {
+		return PyObject_CallMethod((PyObject*)self, "readall", NULL);
+	}
+
+	if (self->finishedOutput || size == 0) {
+		return PyBytes_FromStringAndSize("", 0);
+	}
+
 	result = PyBytes_FromStringAndSize(NULL, size);
 	if (NULL == result) {
 		return NULL;
@@ -172,86 +267,34 @@
 
 readinput:
 
-	/* If we have data left over, consume it. */
-	if (self->input.pos < self->input.size) {
-		oldPos = self->output.pos;
-
-		Py_BEGIN_ALLOW_THREADS
-		zresult = ZSTD_compress_generic(self->compressor->cctx,
-			&self->output, &self->input, ZSTD_e_continue);
-
-		Py_END_ALLOW_THREADS
-
-		self->bytesCompressed += self->output.pos - oldPos;
-
-		/* Input exhausted. Clear out state tracking. */
-		if (self->input.pos == self->input.size) {
-			memset(&self->input, 0, sizeof(self->input));
-			Py_CLEAR(self->readResult);
+    compressResult = compress_input(self, &self->output);
 
-			if (self->buffer.buf) {
-				self->finishedInput = 1;
-			}
-		}
-
-		if (ZSTD_isError(zresult)) {
-			PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult));
-			return NULL;
-		}
-
-		if (self->output.pos) {
-			/* If no more room in output, emit it. */
-			if (self->output.pos == self->output.size) {
-				memset(&self->output, 0, sizeof(self->output));
-				return result;
-			}
-
-			/*
-			 * There is room in the output. We fall through to below, which will either
-			 * get more input for us or will attempt to end the stream.
-			 */
-		}
-
-		/* Fall through to gather more input. */
+	if (-1 == compressResult) {
+		Py_XDECREF(result);
+		return NULL;
+	}
+	else if (0 == compressResult) {
+		/* There is room in the output. We fall through to below, which will
+		 * either get more input for us or will attempt to end the stream.
+		 */
+	}
+	else if (1 == compressResult) {
+		memset(&self->output, 0, sizeof(self->output));
+		return result;
+	}
+	else {
+		assert(0);
 	}
 
-	if (!self->finishedInput) {
-		if (self->reader) {
-			Py_buffer buffer;
-
-			assert(self->readResult == NULL);
-			self->readResult = PyObject_CallMethod(self->reader, "read",
-				"k", self->readSize);
-			if (self->readResult == NULL) {
-				return NULL;
-			}
-
-			memset(&buffer, 0, sizeof(buffer));
-
-			if (0 != PyObject_GetBuffer(self->readResult, &buffer, PyBUF_CONTIG_RO)) {
-				return NULL;
-			}
+	readResult = read_compressor_input(self);
 
-			/* EOF */
-			if (0 == buffer.len) {
-				self->finishedInput = 1;
-				Py_CLEAR(self->readResult);
-			}
-			else {
-				self->input.src = buffer.buf;
-				self->input.size = buffer.len;
-				self->input.pos = 0;
-			}
-
-			PyBuffer_Release(&buffer);
-		}
-		else {
-			assert(self->buffer.buf);
-
-			self->input.src = self->buffer.buf;
-			self->input.size = self->buffer.len;
-			self->input.pos = 0;
-		}
+	if (-1 == readResult) {
+		return NULL;
+	}
+	else if (0 == readResult) { }
+	else if (1 == readResult) { }
+	else {
+		assert(0);
 	}
 
 	if (self->input.size) {
@@ -261,7 +304,7 @@
 	/* Else EOF */
 	oldPos = self->output.pos;
 
-	zresult = ZSTD_compress_generic(self->compressor->cctx, &self->output,
+	zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output,
 		&self->input, ZSTD_e_end);
 
 	self->bytesCompressed += self->output.pos - oldPos;
@@ -269,6 +312,7 @@
 	if (ZSTD_isError(zresult)) {
 		PyErr_Format(ZstdError, "error ending compression stream: %s",
 			ZSTD_getErrorName(zresult));
+		Py_XDECREF(result);
 		return NULL;
 	}
 
@@ -288,9 +332,394 @@
 	return result;
 }
 
+static PyObject* reader_read1(ZstdCompressionReader* self, PyObject* args, PyObject* kwargs) {
+	static char* kwlist[] = {
+		"size",
+		NULL
+	};
+
+	Py_ssize_t size = -1;
+	PyObject* result = NULL;
+	char* resultBuffer;
+	Py_ssize_t resultSize;
+	ZSTD_outBuffer output;
+	int compressResult;
+	size_t oldPos;
+	size_t zresult;
+
+	if (self->closed) {
+		PyErr_SetString(PyExc_ValueError, "stream is closed");
+		return NULL;
+	}
+
+	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n:read1", kwlist, &size)) {
+		return NULL;
+	}
+
+	if (size < -1) {
+		PyErr_SetString(PyExc_ValueError, "cannot read negative amounts less than -1");
+		return NULL;
+	}
+
+	if (self->finishedOutput || size == 0) {
+		return PyBytes_FromStringAndSize("", 0);
+	}
+
+	if (size == -1) {
+		size = ZSTD_CStreamOutSize();
+	}
+
+	result = PyBytes_FromStringAndSize(NULL, size);
+	if (NULL == result) {
+		return NULL;
+	}
+
+	PyBytes_AsStringAndSize(result, &resultBuffer, &resultSize);
+
+	output.dst = resultBuffer;
+	output.size = resultSize;
+	output.pos = 0;
+
+	/* read1() is supposed to use at most 1 read() from the underlying stream.
+	   However, we can't satisfy this requirement with compression because
+	   not every input will generate output. We /could/ flush the compressor,
+	   but this may not be desirable. We allow multiple read() from the
+	   underlying stream. But unlike read(), we return as soon as output data
+	   is available.
+	*/
+
+	compressResult = compress_input(self, &output);
+
+	if (-1 == compressResult) {
+		Py_XDECREF(result);
+		return NULL;
+	}
+	else if (0 == compressResult || 1 == compressResult) { }
+	else {
+		assert(0);
+	}
+
+	if (output.pos) {
+		goto finally;
+	}
+
+	while (!self->finishedInput) {
+		int readResult = read_compressor_input(self);
+
+		if (-1 == readResult) {
+			Py_XDECREF(result);
+			return NULL;
+		}
+		else if (0 == readResult || 1 == readResult) { }
+		else {
+			assert(0);
+		}
+
+		compressResult = compress_input(self, &output);
+
+		if (-1 == compressResult) {
+			Py_XDECREF(result);
+			return NULL;
+		}
+		else if (0 == compressResult || 1 == compressResult) { }
+		else {
+			assert(0);
+		}
+
+		if (output.pos) {
+			goto finally;
+		}
+	}
+
+	/* EOF */
+	oldPos = output.pos;
+
+	zresult = ZSTD_compressStream2(self->compressor->cctx, &output, &self->input,
+        ZSTD_e_end);
+
+	self->bytesCompressed += output.pos - oldPos;
+
+	if (ZSTD_isError(zresult)) {
+		PyErr_Format(ZstdError, "error ending compression stream: %s",
+		    ZSTD_getErrorName(zresult));
+		Py_XDECREF(result);
+		return NULL;
+	}
+
+	if (zresult == 0) {
+		self->finishedOutput = 1;
+	}
+
+finally:
+	if (result) {
+		if (safe_pybytes_resize(&result, output.pos)) {
+			Py_XDECREF(result);
+			return NULL;
+		}
+	}
+
+	return result;
+}
+
 static PyObject* reader_readall(PyObject* self) {
-	PyErr_SetNone(PyExc_NotImplementedError);
-	return NULL;
+	PyObject* chunks = NULL;
+	PyObject* empty = NULL;
+	PyObject* result = NULL;
+
+	/* Our strategy is to collect chunks into a list then join all the
+	 * chunks at the end. We could potentially use e.g. an io.BytesIO. But
+	 * this feels simple enough to implement and avoids potentially expensive
+	 * reallocations of large buffers.
+	 */
+	chunks = PyList_New(0);
+	if (NULL == chunks) {
+		return NULL;
+	}
+
+	while (1) {
+		PyObject* chunk = PyObject_CallMethod(self, "read", "i", 1048576);
+		if (NULL == chunk) {
+			Py_DECREF(chunks);
+			return NULL;
+		}
+
+		if (!PyBytes_Size(chunk)) {
+			Py_DECREF(chunk);
+			break;
+		}
+
+		if (PyList_Append(chunks, chunk)) {
+			Py_DECREF(chunk);
+			Py_DECREF(chunks);
+			return NULL;
+		}
+
+		Py_DECREF(chunk);
+	}
+
+	empty = PyBytes_FromStringAndSize("", 0);
+	if (NULL == empty) {
+		Py_DECREF(chunks);
+		return NULL;
+	}
+
+	result = PyObject_CallMethod(empty, "join", "O", chunks);
+
+	Py_DECREF(empty);
+	Py_DECREF(chunks);
+
+	return result;
+}
+
+static PyObject* reader_readinto(ZstdCompressionReader* self, PyObject* args) {
+	Py_buffer dest;
+	ZSTD_outBuffer output;
+	int readResult, compressResult;
+	PyObject* result = NULL;
+	size_t zresult;
+	size_t oldPos;
+
+	if (self->closed) {
+		PyErr_SetString(PyExc_ValueError, "stream is closed");
+		return NULL;
+	}
+
+	if (self->finishedOutput) {
+		return PyLong_FromLong(0);
+	}
+
+	if (!PyArg_ParseTuple(args, "w*:readinto", &dest)) {
+		return NULL;
+	}
+
+	if (!PyBuffer_IsContiguous(&dest, 'C') || dest.ndim > 1) {
+		PyErr_SetString(PyExc_ValueError,
+		    "destination buffer should be contiguous and have at most one dimension");
+		goto finally;
+	}
+
+	output.dst = dest.buf;
+	output.size = dest.len;
+	output.pos = 0;
+
+	compressResult = compress_input(self, &output);
+
+	if (-1 == compressResult) {
+		goto finally;
+	}
+	else if (0 == compressResult) {	}
+	else if (1 == compressResult) {
+		result = PyLong_FromSize_t(output.pos);
+		goto finally;
+	}
+	else {
+		assert(0);
+	}
+
+	while (!self->finishedInput) {
+		readResult = read_compressor_input(self);
+
+		if (-1 == readResult) {
+			goto finally;
+		}
+		else if (0 == readResult || 1 == readResult) {}
+		else {
+			assert(0);
+		}
+
+		compressResult = compress_input(self, &output);
+
+		if (-1 == compressResult) {
+			goto finally;
+		}
+		else if (0 == compressResult) { }
+		else if (1 == compressResult) {
+			result = PyLong_FromSize_t(output.pos);
+			goto finally;
+		}
+		else {
+			assert(0);
+		}
+	}
+
+	/* EOF */
+	oldPos = output.pos;
+
+	zresult = ZSTD_compressStream2(self->compressor->cctx, &output, &self->input,
+	    ZSTD_e_end);
+
+	self->bytesCompressed += self->output.pos - oldPos;
+
+	if (ZSTD_isError(zresult)) {
+		PyErr_Format(ZstdError, "error ending compression stream: %s",
+		    ZSTD_getErrorName(zresult));
+		goto finally;
+	}
+
+	assert(output.pos);
+
+	if (0 == zresult) {
+		self->finishedOutput = 1;
+	}
+
+	result = PyLong_FromSize_t(output.pos);
+
+finally:
+	PyBuffer_Release(&dest);
+
+	return result;
+}
+
+static PyObject* reader_readinto1(ZstdCompressionReader* self, PyObject* args) {
+	Py_buffer dest;
+	PyObject* result = NULL;
+	ZSTD_outBuffer output;
+	int compressResult;
+	size_t oldPos;
+	size_t zresult;
+
+	if (self->closed) {
+		PyErr_SetString(PyExc_ValueError, "stream is closed");
+		return NULL;
+	}
+
+	if (self->finishedOutput) {
+		return PyLong_FromLong(0);
+	}
+
+	if (!PyArg_ParseTuple(args, "w*:readinto1", &dest)) {
+		return NULL;
+	}
+
+	if (!PyBuffer_IsContiguous(&dest, 'C') || dest.ndim > 1) {
+		PyErr_SetString(PyExc_ValueError,
+		    "destination buffer should be contiguous and have at most one dimension");
+		goto finally;
+	}
+
+	output.dst = dest.buf;
+	output.size = dest.len;
+	output.pos = 0;
+
+	compressResult = compress_input(self, &output);
+
+	if (-1 == compressResult) {
+		goto finally;
+	}
+	else if (0 == compressResult || 1 == compressResult) { }
+	else {
+		assert(0);
+	}
+
+	if (output.pos) {
+		result = PyLong_FromSize_t(output.pos);
+		goto finally;
+	}
+
+	while (!self->finishedInput) {
+		int readResult = read_compressor_input(self);
+
+		if (-1 == readResult) {
+			goto finally;
+		}
+		else if (0 == readResult || 1 == readResult) { }
+		else {
+			assert(0);
+		}
+
+		compressResult = compress_input(self, &output);
+
+		if (-1 == compressResult) {
+			goto finally;
+		}
+		else if (0 == compressResult) { }
+		else if (1 == compressResult) {
+			result = PyLong_FromSize_t(output.pos);
+			goto finally;
+		}
+		else {
+			assert(0);
+		}
+
+		/* If we produced output and we're not done with input, emit
+		 * that output now, as we've hit restrictions of read1().
+		 */
+		if (output.pos && !self->finishedInput) {
+			result = PyLong_FromSize_t(output.pos);
+			goto finally;
+		}
+
+		/* Otherwise we either have no output or we've exhausted the
+		 * input. Either we try to get more input or we fall through
+		 * to EOF below */
+	}
+
+	/* EOF */
+	oldPos = output.pos;
+
+	zresult = ZSTD_compressStream2(self->compressor->cctx, &output, &self->input,
+	    ZSTD_e_end);
+
+	self->bytesCompressed += self->output.pos - oldPos;
+
+	if (ZSTD_isError(zresult)) {
+		PyErr_Format(ZstdError, "error ending compression stream: %s",
+		    ZSTD_getErrorName(zresult));
+		goto finally;
+	}
+
+	assert(output.pos);
+
+	if (0 == zresult) {
+		self->finishedOutput = 1;
+	}
+
+	result = PyLong_FromSize_t(output.pos);
+
+finally:
+	PyBuffer_Release(&dest);
+
+	return result;
 }
 
 static PyObject* reader_iter(PyObject* self) {
@@ -315,7 +744,10 @@
 	{ "readable", (PyCFunction)reader_readable, METH_NOARGS,
 	PyDoc_STR("Returns True") },
 	{ "read", (PyCFunction)reader_read, METH_VARARGS | METH_KEYWORDS, PyDoc_STR("read compressed data") },
+	{ "read1", (PyCFunction)reader_read1, METH_VARARGS | METH_KEYWORDS, NULL },
 	{ "readall", (PyCFunction)reader_readall, METH_NOARGS, PyDoc_STR("Not implemented") },
+	{ "readinto", (PyCFunction)reader_readinto, METH_VARARGS, NULL },
+	{ "readinto1", (PyCFunction)reader_readinto1, METH_VARARGS, NULL },
 	{ "readline", (PyCFunction)reader_readline, METH_VARARGS, PyDoc_STR("Not implemented") },
 	{ "readlines", (PyCFunction)reader_readlines, METH_VARARGS, PyDoc_STR("Not implemented") },
 	{ "seekable", (PyCFunction)reader_seekable, METH_NOARGS,
diff --git a/contrib/python-zstandard/c-ext/compressionwriter.c b/contrib/python-zstandard/c-ext/compressionwriter.c
--- a/contrib/python-zstandard/c-ext/compressionwriter.c
+++ b/contrib/python-zstandard/c-ext/compressionwriter.c
@@ -18,24 +18,23 @@
 	Py_XDECREF(self->compressor);
 	Py_XDECREF(self->writer);
 
+	PyMem_Free(self->output.dst);
+	self->output.dst = NULL;
+
 	PyObject_Del(self);
 }
 
 static PyObject* ZstdCompressionWriter_enter(ZstdCompressionWriter* self) {
-	size_t zresult;
+	if (self->closed) {
+		PyErr_SetString(PyExc_ValueError, "stream is closed");
+		return NULL;
+	}
 
 	if (self->entered) {
 		PyErr_SetString(ZstdError, "cannot __enter__ multiple times");
 		return NULL;
 	}
 
-	zresult = ZSTD_CCtx_setPledgedSrcSize(self->compressor->cctx, self->sourceSize);
-	if (ZSTD_isError(zresult)) {
-		PyErr_Format(ZstdError, "error setting source size: %s",
-			ZSTD_getErrorName(zresult));
-		return NULL;
-	}
-
 	self->entered = 1;
 
 	Py_INCREF(self);
@@ -46,10 +45,6 @@
 	PyObject* exc_type;
 	PyObject* exc_value;
 	PyObject* exc_tb;
-	size_t zresult;
-
-	ZSTD_outBuffer output;
-	PyObject* res;
 
 	if (!PyArg_ParseTuple(args, "OOO:__exit__", &exc_type, &exc_value, &exc_tb)) {
 		return NULL;
@@ -58,46 +53,11 @@
 	self->entered = 0;
 
 	if (exc_type == Py_None && exc_value == Py_None && exc_tb == Py_None) {
-		ZSTD_inBuffer inBuffer;
-
-		inBuffer.src = NULL;
-		inBuffer.size = 0;
-		inBuffer.pos = 0;
-
-		output.dst = PyMem_Malloc(self->outSize);
-		if (!output.dst) {
-			return PyErr_NoMemory();
-		}
-		output.size = self->outSize;
-		output.pos = 0;
+		PyObject* result = PyObject_CallMethod((PyObject*)self, "close", NULL);
 
-		while (1) {
-			zresult = ZSTD_compress_generic(self->compressor->cctx, &output, &inBuffer, ZSTD_e_end);
-			if (ZSTD_isError(zresult)) {
-				PyErr_Format(ZstdError, "error ending compression stream: %s",
-					ZSTD_getErrorName(zresult));
-				PyMem_Free(output.dst);
-				return NULL;
-			}
-
-			if (output.pos) {
-#if PY_MAJOR_VERSION >= 3
-				res = PyObject_CallMethod(self->writer, "write", "y#",
-#else
-				res = PyObject_CallMethod(self->writer, "write", "s#",
-#endif
-					output.dst, output.pos);
-				Py_XDECREF(res);
-			}
-
-			if (!zresult) {
-				break;
-			}
-
-			output.pos = 0;
+		if (NULL == result) {
+			return NULL;
 		}
-
-		PyMem_Free(output.dst);
 	}
 
 	Py_RETURN_FALSE;
@@ -117,7 +77,6 @@
 	Py_buffer source;
 	size_t zresult;
 	ZSTD_inBuffer input;
-	ZSTD_outBuffer output;
 	PyObject* res;
 	Py_ssize_t totalWrite = 0;
 
@@ -130,143 +89,240 @@
 		return NULL;
 	}
 
-	if (!self->entered) {
-		PyErr_SetString(ZstdError, "compress must be called from an active context manager");
-		goto finally;
-	}
-
 	if (!PyBuffer_IsContiguous(&source, 'C') || source.ndim > 1) {
 		PyErr_SetString(PyExc_ValueError,
 			"data buffer should be contiguous and have at most one dimension");
 		goto finally;
 	}
 
-	output.dst = PyMem_Malloc(self->outSize);
-	if (!output.dst) {
-		PyErr_NoMemory();
-		goto finally;
+	if (self->closed) {
+		PyErr_SetString(PyExc_ValueError, "stream is closed");
+		return NULL;
 	}
-	output.size = self->outSize;
-	output.pos = 0;
+
+	self->output.pos = 0;
 
 	input.src = source.buf;
 	input.size = source.len;
 	input.pos = 0;
 
-	while ((ssize_t)input.pos < source.len) {
+	while (input.pos < (size_t)source.len) {
 		Py_BEGIN_ALLOW_THREADS
-		zresult = ZSTD_compress_generic(self->compressor->cctx, &output, &input, ZSTD_e_continue);
+		zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output, &input, ZSTD_e_continue);
 		Py_END_ALLOW_THREADS
 
 		if (ZSTD_isError(zresult)) {
-			PyMem_Free(output.dst);
 			PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult));
 			goto finally;
 		}
 
 		/* Copy data from output buffer to writer. */
-		if (output.pos) {
+		if (self->output.pos) {
 #if PY_MAJOR_VERSION >= 3
 			res = PyObject_CallMethod(self->writer, "write", "y#",
 #else
 			res = PyObject_CallMethod(self->writer, "write", "s#",
 #endif
-				output.dst, output.pos);
+				self->output.dst, self->output.pos);
 			Py_XDECREF(res);
-			totalWrite += output.pos;
-			self->bytesCompressed += output.pos;
+			totalWrite += self->output.pos;
+			self->bytesCompressed += self->output.pos;
 		}
-		output.pos = 0;
+		self->output.pos = 0;
 	}
 
-	PyMem_Free(output.dst);
-
-	result = PyLong_FromSsize_t(totalWrite);
+	if (self->writeReturnRead) {
+		result = PyLong_FromSize_t(input.pos);
+	}
+	else {
+		result = PyLong_FromSsize_t(totalWrite);
+	}
 
 finally:
 	PyBuffer_Release(&source);
 	return result;
 }
 
-static PyObject* ZstdCompressionWriter_flush(ZstdCompressionWriter* self, PyObject* args) {
+static PyObject* ZstdCompressionWriter_flush(ZstdCompressionWriter* self, PyObject* args, PyObject* kwargs) {
+	static char* kwlist[] = {
+		"flush_mode",
+		NULL
+	};
+
 	size_t zresult;
-	ZSTD_outBuffer output;
 	ZSTD_inBuffer input;
 	PyObject* res;
 	Py_ssize_t totalWrite = 0;
+	unsigned flush_mode = 0;
+	ZSTD_EndDirective flush;
 
-	if (!self->entered) {
-		PyErr_SetString(ZstdError, "flush must be called from an active context manager");
+    if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|I:flush",
+		kwlist, &flush_mode)) {
 		return NULL;
 	}
 
+	switch (flush_mode) {
+		case 0:
+			flush = ZSTD_e_flush;
+			break;
+		case 1:
+			flush = ZSTD_e_end;
+			break;
+		default:
+			PyErr_Format(PyExc_ValueError, "unknown flush_mode: %d", flush_mode);
+			return NULL;
+	}
+
+	if (self->closed) {
+		PyErr_SetString(PyExc_ValueError, "stream is closed");
+		return NULL;
+	}
+
+	self->output.pos = 0;
+
 	input.src = NULL;
 	input.size = 0;
 	input.pos = 0;
 
-	output.dst = PyMem_Malloc(self->outSize);
-	if (!output.dst) {
-		return PyErr_NoMemory();
-	}
-	output.size = self->outSize;
-	output.pos = 0;
-
 	while (1) {
 		Py_BEGIN_ALLOW_THREADS
-		zresult = ZSTD_compress_generic(self->compressor->cctx, &output, &input, ZSTD_e_flush);
+		zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output, &input, flush);
 		Py_END_ALLOW_THREADS
 
 		if (ZSTD_isError(zresult)) {
-			PyMem_Free(output.dst);
 			PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult));
 			return NULL;
 		}
 
 		/* Copy data from output buffer to writer. */
-		if (output.pos) {
+		if (self->output.pos) {
 #if PY_MAJOR_VERSION >= 3
 			res = PyObject_CallMethod(self->writer, "write", "y#",
 #else
 			res = PyObject_CallMethod(self->writer, "write", "s#",
 #endif
-				output.dst, output.pos);
+				self->output.dst, self->output.pos);
 			Py_XDECREF(res);
-			totalWrite += output.pos;
-			self->bytesCompressed += output.pos;
+			totalWrite += self->output.pos;
+			self->bytesCompressed += self->output.pos;
 		}
 
-		output.pos = 0;
+		self->output.pos = 0;
 
 		if (!zresult) {
 			break;
 		}
 	}
 
-	PyMem_Free(output.dst);
+	return PyLong_FromSsize_t(totalWrite);
+}
+
+static PyObject* ZstdCompressionWriter_close(ZstdCompressionWriter* self) {
+	PyObject* result;
+
+	if (self->closed) {
+		Py_RETURN_NONE;
+	}
+
+	result = PyObject_CallMethod((PyObject*)self, "flush", "I", 1);
+	self->closed = 1;
+
+	if (NULL == result) {
+	    return NULL;
+	}
 
-	return PyLong_FromSsize_t(totalWrite);
+    /* Call close on underlying stream as well. */
+	if (PyObject_HasAttrString(self->writer, "close")) {
+		return PyObject_CallMethod(self->writer, "close", NULL);
+	}
+
+	Py_RETURN_NONE;
+}
+
+static PyObject* ZstdCompressionWriter_fileno(ZstdCompressionWriter* self) {
+	if (PyObject_HasAttrString(self->writer, "fileno")) {
+		return PyObject_CallMethod(self->writer, "fileno", NULL);
+	}
+	else {
+		PyErr_SetString(PyExc_OSError, "fileno not available on underlying writer");
+		return NULL;
+	}
 }
 
 static PyObject* ZstdCompressionWriter_tell(ZstdCompressionWriter* self) {
 	return PyLong_FromUnsignedLongLong(self->bytesCompressed);
 }
 
+static PyObject* ZstdCompressionWriter_writelines(PyObject* self, PyObject* args) {
+	PyErr_SetNone(PyExc_NotImplementedError);
+	return NULL;
+}
+
+static PyObject* ZstdCompressionWriter_false(PyObject* self, PyObject* args) {
+	Py_RETURN_FALSE;
+}
+
+static PyObject* ZstdCompressionWriter_true(PyObject* self, PyObject* args) {
+	Py_RETURN_TRUE;
+}
+
+static PyObject* ZstdCompressionWriter_unsupported(PyObject* self, PyObject* args, PyObject* kwargs) {
+	PyObject* iomod;
+	PyObject* exc;
+
+	iomod = PyImport_ImportModule("io");
+	if (NULL == iomod) {
+		return NULL;
+	}
+
+	exc = PyObject_GetAttrString(iomod, "UnsupportedOperation");
+	if (NULL == exc) {
+		Py_DECREF(iomod);
+		return NULL;
+	}
+
+	PyErr_SetNone(exc);
+	Py_DECREF(exc);
+	Py_DECREF(iomod);
+
+	return NULL;
+}
+
 static PyMethodDef ZstdCompressionWriter_methods[] = {
 	{ "__enter__", (PyCFunction)ZstdCompressionWriter_enter, METH_NOARGS,
 	PyDoc_STR("Enter a compression context.") },
 	{ "__exit__", (PyCFunction)ZstdCompressionWriter_exit, METH_VARARGS,
 	PyDoc_STR("Exit a compression context.") },
+	{ "close", (PyCFunction)ZstdCompressionWriter_close, METH_NOARGS, NULL },
+	{ "fileno", (PyCFunction)ZstdCompressionWriter_fileno, METH_NOARGS, NULL },
+	{ "isatty", (PyCFunction)ZstdCompressionWriter_false, METH_NOARGS, NULL },
+	{ "readable", (PyCFunction)ZstdCompressionWriter_false, METH_NOARGS, NULL },
+	{ "readline", (PyCFunction)ZstdCompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
+	{ "readlines", (PyCFunction)ZstdCompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
+	{ "seek", (PyCFunction)ZstdCompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
+	{ "seekable", ZstdCompressionWriter_false, METH_NOARGS, NULL },
+	{ "truncate", (PyCFunction)ZstdCompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
+	{ "writable", ZstdCompressionWriter_true, METH_NOARGS, NULL },
+	{ "writelines", ZstdCompressionWriter_writelines, METH_VARARGS, NULL },
+	{ "read", (PyCFunction)ZstdCompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
+	{ "readall", (PyCFunction)ZstdCompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
+	{ "readinto", (PyCFunction)ZstdCompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
 	{ "memory_size", (PyCFunction)ZstdCompressionWriter_memory_size, METH_NOARGS,
 	PyDoc_STR("Obtain the memory size of the underlying compressor") },
 	{ "write", (PyCFunction)ZstdCompressionWriter_write, METH_VARARGS | METH_KEYWORDS,
 	PyDoc_STR("Compress data") },
-	{ "flush", (PyCFunction)ZstdCompressionWriter_flush, METH_NOARGS,
+	{ "flush", (PyCFunction)ZstdCompressionWriter_flush, METH_VARARGS | METH_KEYWORDS,
 	PyDoc_STR("Flush data and finish a zstd frame") },
 	{ "tell", (PyCFunction)ZstdCompressionWriter_tell, METH_NOARGS,
 	PyDoc_STR("Returns current number of bytes compressed") },
 	{ NULL, NULL }
 };
 
+static PyMemberDef ZstdCompressionWriter_members[] = {
+	 { "closed", T_BOOL, offsetof(ZstdCompressionWriter, closed), READONLY, NULL },
+	 { NULL }
+};
+
 PyTypeObject ZstdCompressionWriterType = {
 	PyVarObject_HEAD_INIT(NULL, 0)
 	"zstd.ZstdCompressionWriter",  /* tp_name */
@@ -296,7 +352,7 @@
 	0,                              /* tp_iter */
 	0,                              /* tp_iternext */
 	ZstdCompressionWriter_methods,  /* tp_methods */
-	0,                              /* tp_members */
+	ZstdCompressionWriter_members,  /* tp_members */
 	0,                              /* tp_getset */
 	0,                              /* tp_base */
 	0,                              /* tp_dict */
diff --git a/contrib/python-zstandard/c-ext/compressobj.c b/contrib/python-zstandard/c-ext/compressobj.c
--- a/contrib/python-zstandard/c-ext/compressobj.c
+++ b/contrib/python-zstandard/c-ext/compressobj.c
@@ -59,9 +59,9 @@
 	input.size = source.len;
 	input.pos = 0;
 
-	while ((ssize_t)input.pos < source.len) {
+	while (input.pos < (size_t)source.len) {
 		Py_BEGIN_ALLOW_THREADS
-			zresult = ZSTD_compress_generic(self->compressor->cctx, &self->output,
+			zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output,
 				&input, ZSTD_e_continue);
 		Py_END_ALLOW_THREADS
 
@@ -154,7 +154,7 @@
 
 	while (1) {
 		Py_BEGIN_ALLOW_THREADS
-		zresult = ZSTD_compress_generic(self->compressor->cctx, &self->output,
+		zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output,
 			&input, zFlushMode);
 		Py_END_ALLOW_THREADS
 
diff --git a/contrib/python-zstandard/c-ext/compressor.c b/contrib/python-zstandard/c-ext/compressor.c
--- a/contrib/python-zstandard/c-ext/compressor.c
+++ b/contrib/python-zstandard/c-ext/compressor.c
@@ -204,27 +204,27 @@
 		}
 	}
 	else {
-		if (set_parameter(self->params, ZSTD_p_compressionLevel, level)) {
+		if (set_parameter(self->params, ZSTD_c_compressionLevel, level)) {
 			return -1;
 		}
 
-		if (set_parameter(self->params, ZSTD_p_contentSizeFlag,
+		if (set_parameter(self->params, ZSTD_c_contentSizeFlag,
 			writeContentSize ? PyObject_IsTrue(writeContentSize) : 1)) {
 			return -1;
 		}
 
-		if (set_parameter(self->params, ZSTD_p_checksumFlag,
+		if (set_parameter(self->params, ZSTD_c_checksumFlag,
 			writeChecksum ? PyObject_IsTrue(writeChecksum) : 0)) {
 			return -1;
 		}
 
-		if (set_parameter(self->params, ZSTD_p_dictIDFlag,
+		if (set_parameter(self->params, ZSTD_c_dictIDFlag,
 			writeDictID ? PyObject_IsTrue(writeDictID) : 1)) {
 			return -1;
 		}
 
 		if (threads) {
-			if (set_parameter(self->params, ZSTD_p_nbWorkers, threads)) {
+			if (set_parameter(self->params, ZSTD_c_nbWorkers, threads)) {
 				return -1;
 			}
 		}
@@ -344,7 +344,7 @@
 		return NULL;
 	}
 
-	ZSTD_CCtx_reset(self->cctx);
+	ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only);
 
 	zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, sourceSize);
 	if (ZSTD_isError(zresult)) {
@@ -391,7 +391,7 @@
 
 		while (input.pos < input.size) {
 			Py_BEGIN_ALLOW_THREADS
-			zresult = ZSTD_compress_generic(self->cctx, &output, &input, ZSTD_e_continue);
+			zresult = ZSTD_compressStream2(self->cctx, &output, &input, ZSTD_e_continue);
 			Py_END_ALLOW_THREADS
 
 			if (ZSTD_isError(zresult)) {
@@ -421,7 +421,7 @@
 
 	while (1) {
 		Py_BEGIN_ALLOW_THREADS
-		zresult = ZSTD_compress_generic(self->cctx, &output, &input, ZSTD_e_end);
+		zresult = ZSTD_compressStream2(self->cctx, &output, &input, ZSTD_e_end);
 		Py_END_ALLOW_THREADS
 
 		if (ZSTD_isError(zresult)) {
@@ -517,7 +517,7 @@
 		goto except;
 	}
 
-	ZSTD_CCtx_reset(self->cctx);
+	ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only);
 
 	zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, sourceSize);
 	if (ZSTD_isError(zresult)) {
@@ -577,7 +577,7 @@
 		goto finally;
 	}
 
-	ZSTD_CCtx_reset(self->cctx);
+	ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only);
 
 	destSize = ZSTD_compressBound(source.len);
 	output = PyBytes_FromStringAndSize(NULL, destSize);
@@ -605,7 +605,7 @@
 	/* By avoiding ZSTD_compress(), we don't necessarily write out content
 		size. This means the argument to ZstdCompressor to control frame
 		parameters is honored. */
-	zresult = ZSTD_compress_generic(self->cctx, &outBuffer, &inBuffer, ZSTD_e_end);
+	zresult = ZSTD_compressStream2(self->cctx, &outBuffer, &inBuffer, ZSTD_e_end);
 	Py_END_ALLOW_THREADS
 
 	if (ZSTD_isError(zresult)) {
@@ -651,7 +651,7 @@
 		return NULL;
 	}
 
-	ZSTD_CCtx_reset(self->cctx);
+	ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only);
 
 	zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, inSize);
 	if (ZSTD_isError(zresult)) {
@@ -740,7 +740,7 @@
 		goto except;
 	}
 
-	ZSTD_CCtx_reset(self->cctx);
+	ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only);
 
 	zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, sourceSize);
 	if (ZSTD_isError(zresult)) {
@@ -794,16 +794,19 @@
 		"writer",
 		"size",
 		"write_size",
+		"write_return_read",
 		NULL
 	};
 
 	PyObject* writer;
 	ZstdCompressionWriter* result;
+	size_t zresult;
 	unsigned long long sourceSize = ZSTD_CONTENTSIZE_UNKNOWN;
 	size_t outSize = ZSTD_CStreamOutSize();
+	PyObject* writeReturnRead = NULL;
 
-	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|Kk:stream_writer", kwlist,
-		&writer, &sourceSize, &outSize)) {
+	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|KkO:stream_writer", kwlist,
+		&writer, &sourceSize, &outSize, &writeReturnRead)) {
 		return NULL;
 	}
 
@@ -812,22 +815,38 @@
 		return NULL;
 	}
 
-	ZSTD_CCtx_reset(self->cctx);
+	ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only);
+
+	zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, sourceSize);
+	if (ZSTD_isError(zresult)) {
+		PyErr_Format(ZstdError, "error setting source size: %s",
+			ZSTD_getErrorName(zresult));
+		return NULL;
+	}
 
 	result = (ZstdCompressionWriter*)PyObject_CallObject((PyObject*)&ZstdCompressionWriterType, NULL);
 	if (!result) {
 		return NULL;
 	}
 
+	result->output.dst = PyMem_Malloc(outSize);
+	if (!result->output.dst) {
+		Py_DECREF(result);
+		return (ZstdCompressionWriter*)PyErr_NoMemory();
+	}
+
+	result->output.pos = 0;
+	result->output.size = outSize;
+
 	result->compressor = self;
 	Py_INCREF(result->compressor);
 
 	result->writer = writer;
 	Py_INCREF(result->writer);
 
-	result->sourceSize = sourceSize;
 	result->outSize = outSize;
 	result->bytesCompressed = 0;
+	result->writeReturnRead = writeReturnRead ? PyObject_IsTrue(writeReturnRead) : 0;
 
 	return result;
 }
@@ -853,7 +872,7 @@
 		return NULL;
 	}
 
-	ZSTD_CCtx_reset(self->cctx);
+	ZSTD_CCtx_reset(self->cctx, ZSTD_reset_session_only);
 
 	zresult = ZSTD_CCtx_setPledgedSrcSize(self->cctx, sourceSize);
 	if (ZSTD_isError(zresult)) {
@@ -1115,7 +1134,7 @@
 			break;
 		}
 
-		zresult = ZSTD_compress_generic(state->cctx, &opOutBuffer, &opInBuffer, ZSTD_e_end);
+		zresult = ZSTD_compressStream2(state->cctx, &opOutBuffer, &opInBuffer, ZSTD_e_end);
 		if (ZSTD_isError(zresult)) {
 			state->error = WorkerError_zstd;
 			state->zresult = zresult;
diff --git a/contrib/python-zstandard/c-ext/compressoriterator.c b/contrib/python-zstandard/c-ext/compressoriterator.c
--- a/contrib/python-zstandard/c-ext/compressoriterator.c
+++ b/contrib/python-zstandard/c-ext/compressoriterator.c
@@ -57,7 +57,7 @@
 	/* If we have data left in the input, consume it. */
 	if (self->input.pos < self->input.size) {
 		Py_BEGIN_ALLOW_THREADS
-		zresult = ZSTD_compress_generic(self->compressor->cctx, &self->output,
+		zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output,
 			&self->input, ZSTD_e_continue);
 		Py_END_ALLOW_THREADS
 
@@ -127,7 +127,7 @@
 		self->input.size = 0;
 		self->input.pos = 0;
 
-		zresult = ZSTD_compress_generic(self->compressor->cctx, &self->output,
+		zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output,
 			&self->input, ZSTD_e_end);
 		if (ZSTD_isError(zresult)) {
 			PyErr_Format(ZstdError, "error ending compression stream: %s",
@@ -152,7 +152,7 @@
 	self->input.pos = 0;
 
 	Py_BEGIN_ALLOW_THREADS
-	zresult = ZSTD_compress_generic(self->compressor->cctx, &self->output,
+	zresult = ZSTD_compressStream2(self->compressor->cctx, &self->output,
 		&self->input, ZSTD_e_continue);
 	Py_END_ALLOW_THREADS
 
diff --git a/contrib/python-zstandard/c-ext/constants.c b/contrib/python-zstandard/c-ext/constants.c
--- a/contrib/python-zstandard/c-ext/constants.c
+++ b/contrib/python-zstandard/c-ext/constants.c
@@ -32,6 +32,9 @@
 	ZstdError = PyErr_NewException("zstd.ZstdError", NULL, NULL);
 	PyModule_AddObject(mod, "ZstdError", ZstdError);
 
+	PyModule_AddIntConstant(mod, "FLUSH_BLOCK", 0);
+	PyModule_AddIntConstant(mod, "FLUSH_FRAME", 1);
+
 	PyModule_AddIntConstant(mod, "COMPRESSOBJ_FLUSH_FINISH", compressorobj_flush_finish);
 	PyModule_AddIntConstant(mod, "COMPRESSOBJ_FLUSH_BLOCK", compressorobj_flush_block);
 
@@ -77,8 +80,11 @@
 	PyModule_AddIntConstant(mod, "HASHLOG3_MAX", ZSTD_HASHLOG3_MAX);
 	PyModule_AddIntConstant(mod, "SEARCHLOG_MIN", ZSTD_SEARCHLOG_MIN);
 	PyModule_AddIntConstant(mod, "SEARCHLOG_MAX", ZSTD_SEARCHLOG_MAX);
-	PyModule_AddIntConstant(mod, "SEARCHLENGTH_MIN", ZSTD_SEARCHLENGTH_MIN);
-	PyModule_AddIntConstant(mod, "SEARCHLENGTH_MAX", ZSTD_SEARCHLENGTH_MAX);
+	PyModule_AddIntConstant(mod, "MINMATCH_MIN", ZSTD_MINMATCH_MIN);
+	PyModule_AddIntConstant(mod, "MINMATCH_MAX", ZSTD_MINMATCH_MAX);
+	/* TODO SEARCHLENGTH_* is deprecated. */
+	PyModule_AddIntConstant(mod, "SEARCHLENGTH_MIN", ZSTD_MINMATCH_MIN);
+	PyModule_AddIntConstant(mod, "SEARCHLENGTH_MAX", ZSTD_MINMATCH_MAX);
 	PyModule_AddIntConstant(mod, "TARGETLENGTH_MIN", ZSTD_TARGETLENGTH_MIN);
 	PyModule_AddIntConstant(mod, "TARGETLENGTH_MAX", ZSTD_TARGETLENGTH_MAX);
 	PyModule_AddIntConstant(mod, "LDM_MINMATCH_MIN", ZSTD_LDM_MINMATCH_MIN);
@@ -93,6 +99,7 @@
 	PyModule_AddIntConstant(mod, "STRATEGY_BTLAZY2", ZSTD_btlazy2);
 	PyModule_AddIntConstant(mod, "STRATEGY_BTOPT", ZSTD_btopt);
 	PyModule_AddIntConstant(mod, "STRATEGY_BTULTRA", ZSTD_btultra);
+	PyModule_AddIntConstant(mod, "STRATEGY_BTULTRA2", ZSTD_btultra2);
 
 	PyModule_AddIntConstant(mod, "DICT_TYPE_AUTO", ZSTD_dct_auto);
 	PyModule_AddIntConstant(mod, "DICT_TYPE_RAWCONTENT", ZSTD_dct_rawContent);
diff --git a/contrib/python-zstandard/c-ext/decompressionreader.c b/contrib/python-zstandard/c-ext/decompressionreader.c
--- a/contrib/python-zstandard/c-ext/decompressionreader.c
+++ b/contrib/python-zstandard/c-ext/decompressionreader.c
@@ -102,6 +102,114 @@
 	Py_RETURN_FALSE;
 }
 
+/**
+ * Read available input.
+ *
+ * Returns 0 if no data was added to input.
+ * Returns 1 if new input data is available.
+ * Returns -1 on error and sets a Python exception as a side-effect.
+ */
+int read_decompressor_input(ZstdDecompressionReader* self) {
+	if (self->finishedInput) {
+		return 0;
+	}
+
+	if (self->input.pos != self->input.size) {
+		return 0;
+	}
+
+	if (self->reader) {
+        Py_buffer buffer;
+
+        assert(self->readResult == NULL);
+        self->readResult = PyObject_CallMethod(self->reader, "read",
+            "k", self->readSize);
+        if (NULL == self->readResult) {
+            return -1;
+        }
+
+        memset(&buffer, 0, sizeof(buffer));
+
+        if (0 != PyObject_GetBuffer(self->readResult, &buffer, PyBUF_CONTIG_RO)) {
+            return -1;
+        }
+
+        /* EOF */
+        if (0 == buffer.len) {
+            self->finishedInput = 1;
+            Py_CLEAR(self->readResult);
+        }
+        else {
+            self->input.src = buffer.buf;
+            self->input.size = buffer.len;
+            self->input.pos = 0;
+        }
+
+        PyBuffer_Release(&buffer);
+	}
+	else {
+		assert(self->buffer.buf);
+        /*
+         * We should only get here once since expectation is we always
+         * exhaust input buffer before reading again.
+         */
+        assert(self->input.src == NULL);
+
+		self->input.src = self->buffer.buf;
+        self->input.size = self->buffer.len;
+        self->input.pos = 0;
+	}
+
+	return 1;
+}
+
+/**
+ * Decompresses available input into an output buffer.
+ *
+ * Returns 0 if we need more input.
+ * Returns 1 if output buffer should be emitted.
+ * Returns -1 on error and sets a Python exception.
+ */
+int decompress_input(ZstdDecompressionReader* self, ZSTD_outBuffer* output) {
+	size_t zresult;
+
+	if (self->input.pos >= self->input.size) {
+		return 0;
+	}
+
+	Py_BEGIN_ALLOW_THREADS
+	zresult = ZSTD_decompressStream(self->decompressor->dctx, output, &self->input);
+	Py_END_ALLOW_THREADS
+
+	/* Input exhausted. Clear our state tracking. */
+	if (self->input.pos == self->input.size) {
+		memset(&self->input, 0, sizeof(self->input));
+		Py_CLEAR(self->readResult);
+
+		if (self->buffer.buf) {
+			self->finishedInput = 1;
+		}
+	}
+
+	if (ZSTD_isError(zresult)) {
+		PyErr_Format(ZstdError, "zstd decompress error: %s", ZSTD_getErrorName(zresult));
+		return -1;
+	}
+
+	/* We fulfilled the full read request. Signal to emit. */
+	if (output->pos && output->pos == output->size) {
+		return 1;
+	}
+	/* We're at the end of a frame and we aren't allowed to return data
+	   spanning frames. */
+	else if (output->pos && zresult == 0 && !self->readAcrossFrames) {
+		return 1;
+	}
+
+	/* There is more room in the output. Signal to collect more data. */
+	return 0;
+}
+
 static PyObject* reader_read(ZstdDecompressionReader* self, PyObject* args, PyObject* kwargs) {
 	static char* kwlist[] = {
 		"size",
@@ -113,26 +221,30 @@
 	char* resultBuffer;
 	Py_ssize_t resultSize;
 	ZSTD_outBuffer output;
-	size_t zresult;
+	int decompressResult, readResult;
 
 	if (self->closed) {
 		PyErr_SetString(PyExc_ValueError, "stream is closed");
 		return NULL;
 	}
 
-	if (self->finishedOutput) {
-		return PyBytes_FromStringAndSize("", 0);
-	}
-
-	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "n", kwlist, &size)) {
+	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n", kwlist, &size)) {
 		return NULL;
 	}
 
-	if (size < 1) {
-		PyErr_SetString(PyExc_ValueError, "cannot read negative or size 0 amounts");
+	if (size < -1) {
+		PyErr_SetString(PyExc_ValueError, "cannot read negative amounts less than -1");
 		return NULL;
 	}
 
+	if (size == -1) {
+		return PyObject_CallMethod((PyObject*)self, "readall", NULL);
+	}
+
+	if (self->finishedOutput || size == 0) {
+		return PyBytes_FromStringAndSize("", 0);
+	}
+
 	result = PyBytes_FromStringAndSize(NULL, size);
 	if (NULL == result) {
 		return NULL;
@@ -146,85 +258,38 @@
 
 readinput:
 
-	/* Consume input data left over from last time. */
-	if (self->input.pos < self->input.size) {
-		Py_BEGIN_ALLOW_THREADS
-		zresult = ZSTD_decompress_generic(self->decompressor->dctx,
-			&output, &self->input);
-		Py_END_ALLOW_THREADS
+	decompressResult = decompress_input(self, &output);
 
-		/* Input exhausted. Clear our state tracking. */
-		if (self->input.pos == self->input.size) {
-			memset(&self->input, 0, sizeof(self->input));
-			Py_CLEAR(self->readResult);
+	if (-1 == decompressResult) {
+		Py_XDECREF(result);
+		return NULL;
+	}
+	else if (0 == decompressResult) { }
+	else if (1 == decompressResult) {
+		self->bytesDecompressed += output.pos;
 
-			if (self->buffer.buf) {
-				self->finishedInput = 1;
+		if (output.pos != output.size) {
+			if (safe_pybytes_resize(&result, output.pos)) {
+				Py_XDECREF(result);
+				return NULL;
 			}
 		}
-
-		if (ZSTD_isError(zresult)) {
-			PyErr_Format(ZstdError, "zstd decompress error: %s", ZSTD_getErrorName(zresult));
-			return NULL;
-		}
-		else if (0 == zresult) {
-			self->finishedOutput = 1;
-		}
-
-		/* We fulfilled the full read request. Emit it. */
-		if (output.pos && output.pos == output.size) {
-			self->bytesDecompressed += output.size;
-			return result;
-		}
-
-		/*
-		 * There is more room in the output. Fall through to try to collect
-		 * more data so we can try to fill the output.
-		 */
+		return result;
+	}
+	else {
+		assert(0);
 	}
 
-	if (!self->finishedInput) {
-		if (self->reader) {
-			Py_buffer buffer;
-
-			assert(self->readResult == NULL);
-			self->readResult = PyObject_CallMethod(self->reader, "read",
-				"k", self->readSize);
-			if (NULL == self->readResult) {
-				return NULL;
-			}
-
-			memset(&buffer, 0, sizeof(buffer));
-
-			if (0 != PyObject_GetBuffer(self->readResult, &buffer, PyBUF_CONTIG_RO)) {
-				return NULL;
-			}
+	readResult = read_decompressor_input(self);
 
-			/* EOF */
-			if (0 == buffer.len) {
-				self->finishedInput = 1;
-				Py_CLEAR(self->readResult);
-			}
-			else {
-				self->input.src = buffer.buf;
-				self->input.size = buffer.len;
-				self->input.pos = 0;
-			}
-
-			PyBuffer_Release(&buffer);
-		}
-		else {
-			assert(self->buffer.buf);
-			/*
-			 * We should only get here once since above block will exhaust
-			 * source buffer until finishedInput is set.
-			 */
-			assert(self->input.src == NULL);
-
-			self->input.src = self->buffer.buf;
-			self->input.size = self->buffer.len;
-			self->input.pos = 0;
-		}
+	if (-1 == readResult) {
+		Py_XDECREF(result);
+		return NULL;
+	}
+	else if (0 == readResult) {}
+	else if (1 == readResult) {}
+	else {
+		assert(0);
 	}
 
 	if (self->input.size) {
@@ -242,18 +307,288 @@
 	return result;
 }
 
+static PyObject* reader_read1(ZstdDecompressionReader* self, PyObject* args, PyObject* kwargs) {
+	static char* kwlist[] = {
+		"size",
+		NULL
+	};
+
+	Py_ssize_t size = -1;
+	PyObject* result = NULL;
+	char* resultBuffer;
+	Py_ssize_t resultSize;
+	ZSTD_outBuffer output;
+
+	if (self->closed) {
+		PyErr_SetString(PyExc_ValueError, "stream is closed");
+		return NULL;
+	}
+
+	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n", kwlist, &size)) {
+		return NULL;
+	}
+
+	if (size < -1) {
+		PyErr_SetString(PyExc_ValueError, "cannot read negative amounts less than -1");
+		return NULL;
+	}
+
+	if (self->finishedOutput || size == 0) {
+		return PyBytes_FromStringAndSize("", 0);
+	}
+
+	if (size == -1) {
+		size = ZSTD_DStreamOutSize();
+	}
+
+	result = PyBytes_FromStringAndSize(NULL, size);
+	if (NULL == result) {
+		return NULL;
+	}
+
+	PyBytes_AsStringAndSize(result, &resultBuffer, &resultSize);
+
+	output.dst = resultBuffer;
+	output.size = resultSize;
+	output.pos = 0;
+
+	/* read1() is supposed to use at most 1 read() from the underlying stream.
+	 * However, we can't satisfy this requirement with decompression due to the
+	 * nature of how decompression works. Our strategy is to read + decompress
+	 * until we get any output, at which point we return. This satisfies the
+	 * intent of the read1() API to limit read operations.
+	 */
+	while (!self->finishedInput) {
+		int readResult, decompressResult;
+
+		readResult = read_decompressor_input(self);
+		if (-1 == readResult) {
+			Py_XDECREF(result);
+			return NULL;
+		}
+		else if (0 == readResult || 1 == readResult) { }
+		else {
+			assert(0);
+		}
+
+		decompressResult = decompress_input(self, &output);
+
+		if (-1 == decompressResult) {
+			Py_XDECREF(result);
+			return NULL;
+		}
+		else if (0 == decompressResult || 1 == decompressResult) { }
+		else {
+			assert(0);
+		}
+
+		if (output.pos) {
+		    break;
+		}
+	}
+
+	self->bytesDecompressed += output.pos;
+	if (safe_pybytes_resize(&result, output.pos)) {
+		Py_XDECREF(result);
+		return NULL;
+	}
+
+	return result;
+}
+
+static PyObject* reader_readinto(ZstdDecompressionReader* self, PyObject* args) {
+	Py_buffer dest;
+	ZSTD_outBuffer output;
+	int decompressResult, readResult;
+	PyObject* result = NULL;
+
+	if (self->closed) {
+		PyErr_SetString(PyExc_ValueError, "stream is closed");
+		return NULL;
+	}
+
+	if (self->finishedOutput) {
+		return PyLong_FromLong(0);
+	}
+
+	if (!PyArg_ParseTuple(args, "w*:readinto", &dest)) {
+		return NULL;
+	}
+
+	if (!PyBuffer_IsContiguous(&dest, 'C') || dest.ndim > 1) {
+		PyErr_SetString(PyExc_ValueError,
+			"destination buffer should be contiguous and have at most one dimension");
+	    goto finally;
+	}
+
+	output.dst = dest.buf;
+	output.size = dest.len;
+	output.pos = 0;
+
+readinput:
+
+	decompressResult = decompress_input(self, &output);
+
+	if (-1 == decompressResult) {
+		goto finally;
+	}
+	else if (0 == decompressResult) { }
+	else if (1 == decompressResult) {
+		self->bytesDecompressed += output.pos;
+		result = PyLong_FromSize_t(output.pos);
+		goto finally;
+	}
+	else {
+		assert(0);
+	}
+
+	readResult = read_decompressor_input(self);
+
+	if (-1 == readResult) {
+		goto finally;
+	}
+	else if (0 == readResult) {}
+	else if (1 == readResult) {}
+	else {
+		assert(0);
+	}
+
+	if (self->input.size) {
+		goto readinput;
+	}
+
+	/* EOF */
+	self->bytesDecompressed += output.pos;
+	result = PyLong_FromSize_t(output.pos);
+
+finally:
+	PyBuffer_Release(&dest);
+
+	return result;
+}
+
+static PyObject* reader_readinto1(ZstdDecompressionReader* self, PyObject* args) {
+	Py_buffer dest;
+	ZSTD_outBuffer output;
+	PyObject* result = NULL;
+
+	if (self->closed) {
+		PyErr_SetString(PyExc_ValueError, "stream is closed");
+		return NULL;
+	}
+
+	if (self->finishedOutput) {
+		return PyLong_FromLong(0);
+	}
+
+	if (!PyArg_ParseTuple(args, "w*:readinto1", &dest)) {
+		return NULL;
+	}
+
+	if (!PyBuffer_IsContiguous(&dest, 'C') || dest.ndim > 1) {
+		PyErr_SetString(PyExc_ValueError,
+			"destination buffer should be contiguous and have at most one dimension");
+	    goto finally;
+	}
+
+	output.dst = dest.buf;
+	output.size = dest.len;
+	output.pos = 0;
+
+	while (!self->finishedInput && !self->finishedOutput) {
+		int decompressResult, readResult;
+
+		readResult = read_decompressor_input(self);
+
+		if (-1 == readResult) {
+			goto finally;
+		}
+		else if (0 == readResult || 1 == readResult) {}
+		else {
+			assert(0);
+		}
+
+		decompressResult = decompress_input(self, &output);
+
+		if (-1 == decompressResult) {
+			goto finally;
+		}
+		else if (0 == decompressResult || 1 == decompressResult) {}
+		else {
+			assert(0);
+		}
+
+		if (output.pos) {
+			break;
+		}
+	}
+
+	self->bytesDecompressed += output.pos;
+	result = PyLong_FromSize_t(output.pos);
+
+finally:
+	PyBuffer_Release(&dest);
+
+	return result;
+}
+
 static PyObject* reader_readall(PyObject* self) {
-	PyErr_SetNone(PyExc_NotImplementedError);
-	return NULL;
+	PyObject* chunks = NULL;
+	PyObject* empty = NULL;
+	PyObject* result = NULL;
+
+	/* Our strategy is to collect chunks into a list then join all the
+	 * chunks at the end. We could potentially use e.g. an io.BytesIO. But
+	 * this feels simple enough to implement and avoids potentially expensive
+	 * reallocations of large buffers.
+	 */
+	chunks = PyList_New(0);
+	if (NULL == chunks) {
+		return NULL;
+	}
+
+	while (1) {
+		PyObject* chunk = PyObject_CallMethod(self, "read", "i", 1048576);
+		if (NULL == chunk) {
+			Py_DECREF(chunks);
+			return NULL;
+		}
+
+		if (!PyBytes_Size(chunk)) {
+			Py_DECREF(chunk);
+			break;
+		}
+
+		if (PyList_Append(chunks, chunk)) {
+			Py_DECREF(chunk);
+			Py_DECREF(chunks);
+			return NULL;
+		}
+
+		Py_DECREF(chunk);
+	}
+
+	empty = PyBytes_FromStringAndSize("", 0);
+	if (NULL == empty) {
+		Py_DECREF(chunks);
+		return NULL;
+	}
+
+	result = PyObject_CallMethod(empty, "join", "O", chunks);
+
+	Py_DECREF(empty);
+	Py_DECREF(chunks);
+
+	return result;
 }
 
 static PyObject* reader_readline(PyObject* self) {
-	PyErr_SetNone(PyExc_NotImplementedError);
+	set_unsupported_operation();
 	return NULL;
 }
 
 static PyObject* reader_readlines(PyObject* self) {
-	PyErr_SetNone(PyExc_NotImplementedError);
+	set_unsupported_operation();
 	return NULL;
 }
 
@@ -345,12 +680,12 @@
 }
 
 static PyObject* reader_iter(PyObject* self) {
-	PyErr_SetNone(PyExc_NotImplementedError);
+	set_unsupported_operation();
 	return NULL;
 }
 
 static PyObject* reader_iternext(PyObject* self) {
-	PyErr_SetNone(PyExc_NotImplementedError);
+	set_unsupported_operation();
 	return NULL;
 }
 
@@ -367,6 +702,10 @@
 	PyDoc_STR("Returns True") },
 	{ "read", (PyCFunction)reader_read, METH_VARARGS | METH_KEYWORDS,
 	PyDoc_STR("read compressed data") },
+	{ "read1", (PyCFunction)reader_read1, METH_VARARGS | METH_KEYWORDS,
+	PyDoc_STR("read compressed data") },
+	{ "readinto", (PyCFunction)reader_readinto, METH_VARARGS, NULL },
+	{ "readinto1", (PyCFunction)reader_readinto1, METH_VARARGS, NULL },
 	{ "readall", (PyCFunction)reader_readall, METH_NOARGS, PyDoc_STR("Not implemented") },
 	{ "readline", (PyCFunction)reader_readline, METH_NOARGS, PyDoc_STR("Not implemented") },
 	{ "readlines", (PyCFunction)reader_readlines, METH_NOARGS, PyDoc_STR("Not implemented") },
diff --git a/contrib/python-zstandard/c-ext/decompressionwriter.c b/contrib/python-zstandard/c-ext/decompressionwriter.c
--- a/contrib/python-zstandard/c-ext/decompressionwriter.c
+++ b/contrib/python-zstandard/c-ext/decompressionwriter.c
@@ -22,12 +22,13 @@
 }
 
 static PyObject* ZstdDecompressionWriter_enter(ZstdDecompressionWriter* self) {
-	if (self->entered) {
-		PyErr_SetString(ZstdError, "cannot __enter__ multiple times");
+	if (self->closed) {
+		PyErr_SetString(PyExc_ValueError, "stream is closed");
 		return NULL;
 	}
 
-	if (ensure_dctx(self->decompressor, 1)) {
+	if (self->entered) {
+		PyErr_SetString(ZstdError, "cannot __enter__ multiple times");
 		return NULL;
 	}
 
@@ -40,6 +41,10 @@
 static PyObject* ZstdDecompressionWriter_exit(ZstdDecompressionWriter* self, PyObject* args) {
 	self->entered = 0;
 
+	if (NULL == PyObject_CallMethod((PyObject*)self, "close", NULL)) {
+		return NULL;
+	}
+
 	Py_RETURN_FALSE;
 }
 
@@ -76,9 +81,9 @@
 		goto finally;
 	}
 
-	if (!self->entered) {
-		PyErr_SetString(ZstdError, "write must be called from an active context manager");
-		goto finally;
+	if (self->closed) {
+		PyErr_SetString(PyExc_ValueError, "stream is closed");
+		return NULL;
 	}
 
 	output.dst = PyMem_Malloc(self->outSize);
@@ -93,9 +98,9 @@
 	input.size = source.len;
 	input.pos = 0;
 
-	while ((ssize_t)input.pos < source.len) {
+	while (input.pos < (size_t)source.len) {
 		Py_BEGIN_ALLOW_THREADS
-		zresult = ZSTD_decompress_generic(self->decompressor->dctx, &output, &input);
+		zresult = ZSTD_decompressStream(self->decompressor->dctx, &output, &input);
 		Py_END_ALLOW_THREADS
 
 		if (ZSTD_isError(zresult)) {
@@ -120,13 +125,94 @@
 
 	PyMem_Free(output.dst);
 
-	result = PyLong_FromSsize_t(totalWrite);
+	if (self->writeReturnRead) {
+		result = PyLong_FromSize_t(input.pos);
+	}
+	else {
+		result = PyLong_FromSsize_t(totalWrite);
+	}
 
 finally:
 	PyBuffer_Release(&source);
 	return result;
 }
 
+static PyObject* ZstdDecompressionWriter_close(ZstdDecompressionWriter* self) {
+	PyObject* result;
+
+	if (self->closed) {
+		Py_RETURN_NONE;
+	}
+
+	result = PyObject_CallMethod((PyObject*)self, "flush", NULL);
+	self->closed = 1;
+
+	if (NULL == result) {
+		return NULL;
+	}
+
+	/* Call close on underlying stream as well. */
+	if (PyObject_HasAttrString(self->writer, "close")) {
+		return PyObject_CallMethod(self->writer, "close", NULL);
+	}
+
+	Py_RETURN_NONE;
+}
+
+static PyObject* ZstdDecompressionWriter_fileno(ZstdDecompressionWriter* self) {
+	if (PyObject_HasAttrString(self->writer, "fileno")) {
+		return PyObject_CallMethod(self->writer, "fileno", NULL);
+	}
+	else {
+		PyErr_SetString(PyExc_OSError, "fileno not available on underlying writer");
+		return NULL;
+	}
+}
+
+static PyObject* ZstdDecompressionWriter_flush(ZstdDecompressionWriter* self) {
+	if (self->closed) {
+		PyErr_SetString(PyExc_ValueError, "stream is closed");
+		return NULL;
+	}
+
+	if (PyObject_HasAttrString(self->writer, "flush")) {
+		return PyObject_CallMethod(self->writer, "flush", NULL);
+	}
+	else {
+		Py_RETURN_NONE;
+	}
+}
+
+static PyObject* ZstdDecompressionWriter_false(PyObject* self, PyObject* args) {
+	Py_RETURN_FALSE;
+}
+
+static PyObject* ZstdDecompressionWriter_true(PyObject* self, PyObject* args) {
+	Py_RETURN_TRUE;
+}
+
+static PyObject* ZstdDecompressionWriter_unsupported(PyObject* self, PyObject* args, PyObject* kwargs) {
+	PyObject* iomod;
+	PyObject* exc;
+
+	iomod = PyImport_ImportModule("io");
+	if (NULL == iomod) {
+		return NULL;
+	}
+
+	exc = PyObject_GetAttrString(iomod, "UnsupportedOperation");
+	if (NULL == exc) {
+		Py_DECREF(iomod);
+		return NULL;
+	}
+
+	PyErr_SetNone(exc);
+	Py_DECREF(exc);
+	Py_DECREF(iomod);
+
+	return NULL;
+}
+
 static PyMethodDef ZstdDecompressionWriter_methods[] = {
 	{ "__enter__", (PyCFunction)ZstdDecompressionWriter_enter, METH_NOARGS,
 	PyDoc_STR("Enter a decompression context.") },
@@ -134,11 +220,32 @@
 	PyDoc_STR("Exit a decompression context.") },
 	{ "memory_size", (PyCFunction)ZstdDecompressionWriter_memory_size, METH_NOARGS,
 	PyDoc_STR("Obtain the memory size in bytes of the underlying decompressor.") },
+	{ "close", (PyCFunction)ZstdDecompressionWriter_close, METH_NOARGS, NULL },
+	{ "fileno", (PyCFunction)ZstdDecompressionWriter_fileno, METH_NOARGS, NULL },
+	{ "flush", (PyCFunction)ZstdDecompressionWriter_flush, METH_NOARGS, NULL },
+	{ "isatty", ZstdDecompressionWriter_false, METH_NOARGS, NULL },
+	{ "readable", ZstdDecompressionWriter_false, METH_NOARGS, NULL },
+	{ "readline", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
+	{ "readlines", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
+	{ "seek", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
+	{ "seekable", ZstdDecompressionWriter_false, METH_NOARGS, NULL },
+	{ "tell", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
+	{ "truncate", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
+	{ "writable", ZstdDecompressionWriter_true, METH_NOARGS, NULL },
+	{ "writelines" , (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
+	{ "read", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
+	{ "readall", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
+	{ "readinto", (PyCFunction)ZstdDecompressionWriter_unsupported, METH_VARARGS | METH_KEYWORDS, NULL },
 	{ "write", (PyCFunction)ZstdDecompressionWriter_write, METH_VARARGS | METH_KEYWORDS,
 	PyDoc_STR("Compress data") },
 	{ NULL, NULL }
 };
 
+static PyMemberDef ZstdDecompressionWriter_members[] = {
+	{ "closed", T_BOOL, offsetof(ZstdDecompressionWriter, closed), READONLY, NULL },
+	{ NULL }
+};
+
 PyTypeObject ZstdDecompressionWriterType = {
 	PyVarObject_HEAD_INIT(NULL, 0)
 	"zstd.ZstdDecompressionWriter", /* tp_name */
@@ -168,7 +275,7 @@
 	0,                              /* tp_iter */
 	0,                              /* tp_iternext */
 	ZstdDecompressionWriter_methods,/* tp_methods */
-	0,                              /* tp_members */
+	ZstdDecompressionWriter_members,/* tp_members */
 	0,                              /* tp_getset */
 	0,                              /* tp_base */
 	0,                              /* tp_dict */
diff --git a/contrib/python-zstandard/c-ext/decompressobj.c b/contrib/python-zstandard/c-ext/decompressobj.c
--- a/contrib/python-zstandard/c-ext/decompressobj.c
+++ b/contrib/python-zstandard/c-ext/decompressobj.c
@@ -75,7 +75,7 @@
 
 	while (1) {
 		Py_BEGIN_ALLOW_THREADS
-		zresult = ZSTD_decompress_generic(self->decompressor->dctx, &output, &input);
+		zresult = ZSTD_decompressStream(self->decompressor->dctx, &output, &input);
 		Py_END_ALLOW_THREADS
 
 		if (ZSTD_isError(zresult)) {
@@ -130,9 +130,26 @@
 	return result;
 }
 
+static PyObject* DecompressionObj_flush(ZstdDecompressionObj* self, PyObject* args, PyObject* kwargs) {
+	static char* kwlist[] = {
+		"length",
+		NULL
+	};
+
+	PyObject* length = NULL;
+
+	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|O:flush", kwlist, &length)) {
+	return NULL;
+	}
+
+	Py_RETURN_NONE;
+}
+
 static PyMethodDef DecompressionObj_methods[] = {
 	{ "decompress", (PyCFunction)DecompressionObj_decompress,
 	  METH_VARARGS | METH_KEYWORDS, PyDoc_STR("decompress data") },
+	{ "flush", (PyCFunction)DecompressionObj_flush,
+	  METH_VARARGS | METH_KEYWORDS, PyDoc_STR("no-op") },
 	{ NULL, NULL }
 };
 
diff --git a/contrib/python-zstandard/c-ext/decompressor.c b/contrib/python-zstandard/c-ext/decompressor.c
--- a/contrib/python-zstandard/c-ext/decompressor.c
+++ b/contrib/python-zstandard/c-ext/decompressor.c
@@ -17,7 +17,7 @@
 int ensure_dctx(ZstdDecompressor* decompressor, int loadDict) {
 	size_t zresult;
 
-	ZSTD_DCtx_reset(decompressor->dctx);
+	ZSTD_DCtx_reset(decompressor->dctx, ZSTD_reset_session_only);
 
 	if (decompressor->maxWindowSize) {
 		zresult = ZSTD_DCtx_setMaxWindowSize(decompressor->dctx, decompressor->maxWindowSize);
@@ -229,7 +229,7 @@
 
 		while (input.pos < input.size) {
 			Py_BEGIN_ALLOW_THREADS
-			zresult = ZSTD_decompress_generic(self->dctx, &output, &input);
+			zresult = ZSTD_decompressStream(self->dctx, &output, &input);
 			Py_END_ALLOW_THREADS
 
 			if (ZSTD_isError(zresult)) {
@@ -379,7 +379,7 @@
 	inBuffer.pos = 0;
 
 	Py_BEGIN_ALLOW_THREADS
-	zresult = ZSTD_decompress_generic(self->dctx, &outBuffer, &inBuffer);
+	zresult = ZSTD_decompressStream(self->dctx, &outBuffer, &inBuffer);
 	Py_END_ALLOW_THREADS
 
 	if (ZSTD_isError(zresult)) {
@@ -550,28 +550,35 @@
 }
 
 PyDoc_STRVAR(Decompressor_stream_reader__doc__,
-"stream_reader(source, [read_size=default])\n"
+"stream_reader(source, [read_size=default, [read_across_frames=False]])\n"
 "\n"
 "Obtain an object that behaves like an I/O stream that can be used for\n"
 "reading decompressed output from an object.\n"
 "\n"
 "The source object can be any object with a ``read(size)`` method or that\n"
 "conforms to the buffer protocol.\n"
+"\n"
+"``read_across_frames`` controls the behavior of ``read()`` when the end\n"
+"of a zstd frame is reached. When ``True``, ``read()`` can potentially\n"
+"return data belonging to multiple zstd frames. When ``False``, ``read()``\n"
+"will return when the end of a frame is reached.\n"
 );
 
 static ZstdDecompressionReader* Decompressor_stream_reader(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) {
 	static char* kwlist[] = {
 		"source",
 		"read_size",
+		"read_across_frames",
 		NULL
 	};
 
 	PyObject* source;
 	size_t readSize = ZSTD_DStreamInSize();
+	PyObject* readAcrossFrames = NULL;
 	ZstdDecompressionReader* result;
 
-	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|k:stream_reader", kwlist,
-		&source, &readSize)) {
+	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|kO:stream_reader", kwlist,
+		&source, &readSize, &readAcrossFrames)) {
 		return NULL;
 	}
 
@@ -604,6 +611,7 @@
 
 	result->decompressor = self;
 	Py_INCREF(self);
+	result->readAcrossFrames = readAcrossFrames ? PyObject_IsTrue(readAcrossFrames) : 0;
 
 	return result;
 }
@@ -625,15 +633,17 @@
 	static char* kwlist[] = {
 		"writer",
 		"write_size",
+		"write_return_read",
 		NULL
 	};
 
 	PyObject* writer;
 	size_t outSize = ZSTD_DStreamOutSize();
+	PyObject* writeReturnRead = NULL;
 	ZstdDecompressionWriter* result;
 
-	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|k:stream_writer", kwlist,
-		&writer, &outSize)) {
+	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|kO:stream_writer", kwlist,
+		&writer, &outSize, &writeReturnRead)) {
 		return NULL;
 	}
 
@@ -642,6 +652,10 @@
 		return NULL;
 	}
 
+	if (ensure_dctx(self, 1)) {
+		return NULL;
+	}
+
 	result = (ZstdDecompressionWriter*)PyObject_CallObject((PyObject*)&ZstdDecompressionWriterType, NULL);
 	if (!result) {
 		return NULL;
@@ -654,6 +668,7 @@
 	Py_INCREF(result->writer);
 
 	result->outSize = outSize;
+	result->writeReturnRead = writeReturnRead ? PyObject_IsTrue(writeReturnRead) : 0;
 
 	return result;
 }
@@ -756,7 +771,7 @@
 	inBuffer.pos = 0;
 
 	Py_BEGIN_ALLOW_THREADS
-	zresult = ZSTD_decompress_generic(self->dctx, &outBuffer, &inBuffer);
+	zresult = ZSTD_decompressStream(self->dctx, &outBuffer, &inBuffer);
 	Py_END_ALLOW_THREADS
 	if (ZSTD_isError(zresult)) {
 		PyErr_Format(ZstdError, "could not decompress chunk 0: %s", ZSTD_getErrorName(zresult));
@@ -852,7 +867,7 @@
 			outBuffer.pos = 0;
 
 			Py_BEGIN_ALLOW_THREADS
-			zresult = ZSTD_decompress_generic(self->dctx, &outBuffer, &inBuffer);
+			zresult = ZSTD_decompressStream(self->dctx, &outBuffer, &inBuffer);
 			Py_END_ALLOW_THREADS
 			if (ZSTD_isError(zresult)) {
 				PyErr_Format(ZstdError, "could not decompress chunk %zd: %s",
@@ -892,7 +907,7 @@
 			outBuffer.pos = 0;
 
 			Py_BEGIN_ALLOW_THREADS
-			zresult = ZSTD_decompress_generic(self->dctx, &outBuffer, &inBuffer);
+			zresult = ZSTD_decompressStream(self->dctx, &outBuffer, &inBuffer);
 			Py_END_ALLOW_THREADS
 			if (ZSTD_isError(zresult)) {
 				PyErr_Format(ZstdError, "could not decompress chunk %zd: %s",
@@ -1176,7 +1191,7 @@
 		inBuffer.size = sourceSize;
 		inBuffer.pos = 0;
 
-		zresult = ZSTD_decompress_generic(state->dctx, &outBuffer, &inBuffer);
+		zresult = ZSTD_decompressStream(state->dctx, &outBuffer, &inBuffer);
 		if (ZSTD_isError(zresult)) {
 			state->error = WorkerError_zstd;
 			state->zresult = zresult;
diff --git a/contrib/python-zstandard/c-ext/decompressoriterator.c b/contrib/python-zstandard/c-ext/decompressoriterator.c
--- a/contrib/python-zstandard/c-ext/decompressoriterator.c
+++ b/contrib/python-zstandard/c-ext/decompressoriterator.c
@@ -57,7 +57,7 @@
 	self->output.pos = 0;
 
 	Py_BEGIN_ALLOW_THREADS
-	zresult = ZSTD_decompress_generic(self->decompressor->dctx, &self->output, &self->input);
+	zresult = ZSTD_decompressStream(self->decompressor->dctx, &self->output, &self->input);
 	Py_END_ALLOW_THREADS
 
 	/* We're done with the pointer. Nullify to prevent anyone from getting a
diff --git a/contrib/python-zstandard/c-ext/python-zstandard.h b/contrib/python-zstandard/c-ext/python-zstandard.h
--- a/contrib/python-zstandard/c-ext/python-zstandard.h
+++ b/contrib/python-zstandard/c-ext/python-zstandard.h
@@ -16,7 +16,7 @@
 #include <zdict.h>
 
 /* Remember to change the string in zstandard/__init__ as well */
-#define PYTHON_ZSTANDARD_VERSION "0.10.1"
+#define PYTHON_ZSTANDARD_VERSION "0.11.0"
 
 typedef enum {
 	compressorobj_flush_finish,
@@ -31,27 +31,6 @@
 typedef struct {
 	PyObject_HEAD
 	ZSTD_CCtx_params* params;
-	unsigned format;
-	int compressionLevel;
-	unsigned windowLog;
-	unsigned hashLog;
-	unsigned chainLog;
-	unsigned searchLog;
-	unsigned minMatch;
-	unsigned targetLength;
-	unsigned compressionStrategy;
-	unsigned contentSizeFlag;
-	unsigned checksumFlag;
-	unsigned dictIDFlag;
-	unsigned threads;
-	unsigned jobSize;
-	unsigned overlapSizeLog;
-	unsigned forceMaxWindow;
-	unsigned enableLongDistanceMatching;
-	unsigned ldmHashLog;
-	unsigned ldmMinMatch;
-	unsigned ldmBucketSizeLog;
-	unsigned ldmHashEveryLog;
 } ZstdCompressionParametersObject;
 
 extern PyTypeObject ZstdCompressionParametersType;
@@ -129,9 +108,11 @@
 
 	ZstdCompressor* compressor;
 	PyObject* writer;
-	unsigned long long sourceSize;
+	ZSTD_outBuffer output;
 	size_t outSize;
 	int entered;
+	int closed;
+	int writeReturnRead;
 	unsigned long long bytesCompressed;
 } ZstdCompressionWriter;
 
@@ -235,6 +216,8 @@
 	PyObject* reader;
 	/* Size for read() operations on reader. */
 	size_t readSize;
+	/* Whether a read() can return data spanning multiple zstd frames. */
+	int readAcrossFrames;
 	/* Buffer to read from (if reading from a buffer). */
 	Py_buffer buffer;
 
@@ -267,6 +250,8 @@
 	PyObject* writer;
 	size_t outSize;
 	int entered;
+	int closed;
+	int writeReturnRead;
 } ZstdDecompressionWriter;
 
 extern PyTypeObject ZstdDecompressionWriterType;
@@ -360,8 +345,9 @@
 
 extern PyTypeObject ZstdBufferWithSegmentsCollectionType;
 
-int set_parameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, unsigned value);
+int set_parameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, int value);
 int set_parameters(ZSTD_CCtx_params* params, ZstdCompressionParametersObject* obj);
+int to_cparams(ZstdCompressionParametersObject* params, ZSTD_compressionParameters* cparams);
 FrameParametersObject* get_frame_parameters(PyObject* self, PyObject* args, PyObject* kwargs);
 int ensure_ddict(ZstdCompressionDict* dict);
 int ensure_dctx(ZstdDecompressor* decompressor, int loadDict);
diff --git a/contrib/python-zstandard/make_cffi.py b/contrib/python-zstandard/make_cffi.py
--- a/contrib/python-zstandard/make_cffi.py
+++ b/contrib/python-zstandard/make_cffi.py
@@ -36,7 +36,9 @@
     'compress/zstd_opt.c',
     'compress/zstdmt_compress.c',
     'decompress/huf_decompress.c',
+    'decompress/zstd_ddict.c',
     'decompress/zstd_decompress.c',
+    'decompress/zstd_decompress_block.c',
     'dictBuilder/cover.c',
     'dictBuilder/fastcover.c',
     'dictBuilder/divsufsort.c',
diff --git a/contrib/python-zstandard/setup.py b/contrib/python-zstandard/setup.py
--- a/contrib/python-zstandard/setup.py
+++ b/contrib/python-zstandard/setup.py
@@ -5,12 +5,32 @@
 # This software may be modified and distributed under the terms
 # of the BSD license. See the LICENSE file for details.
 
+from __future__ import print_function
+
+from distutils.version import LooseVersion
 import os
 import sys
 from setuptools import setup
 
+# Need change in 1.10 for ffi.from_buffer() to handle all buffer types
+# (like memoryview).
+# Need feature in 1.11 for ffi.gc() to declare size of objects so we avoid
+# garbage collection pitfalls.
+MINIMUM_CFFI_VERSION = '1.11'
+
 try:
     import cffi
+
+    # PyPy (and possibly other distros) have CFFI distributed as part of
+    # them. The install_requires for CFFI below won't work. We need to sniff
+    # out the CFFI version here and reject CFFI if it is too old.
+    cffi_version = LooseVersion(cffi.__version__)
+    if cffi_version < LooseVersion(MINIMUM_CFFI_VERSION):
+        print('CFFI 1.11 or newer required (%s found); '
+              'not building CFFI backend' % cffi_version,
+              file=sys.stderr)
+        cffi = None
+
 except ImportError:
     cffi = None
 
@@ -49,12 +69,7 @@
 if cffi:
     import make_cffi
     extensions.append(make_cffi.ffi.distutils_extension())
-
-    # Need change in 1.10 for ffi.from_buffer() to handle all buffer types
-    # (like memoryview).
-    # Need feature in 1.11 for ffi.gc() to declare size of objects so we avoid
-    # garbage collection pitfalls.
-    install_requires.append('cffi>=1.11')
+    install_requires.append('cffi>=%s' % MINIMUM_CFFI_VERSION)
 
 version = None
 
@@ -88,6 +103,7 @@
         'Programming Language :: Python :: 3.4',
         'Programming Language :: Python :: 3.5',
         'Programming Language :: Python :: 3.6',
+        'Programming Language :: Python :: 3.7',
     ],
     keywords='zstandard zstd compression',
     packages=['zstandard'],
diff --git a/contrib/python-zstandard/setup_zstd.py b/contrib/python-zstandard/setup_zstd.py
--- a/contrib/python-zstandard/setup_zstd.py
+++ b/contrib/python-zstandard/setup_zstd.py
@@ -30,7 +30,9 @@
     'compress/zstd_opt.c',
     'compress/zstdmt_compress.c',
     'decompress/huf_decompress.c',
+    'decompress/zstd_ddict.c',
     'decompress/zstd_decompress.c',
+    'decompress/zstd_decompress_block.c',
     'dictBuilder/cover.c',
     'dictBuilder/divsufsort.c',
     'dictBuilder/fastcover.c',
diff --git a/contrib/python-zstandard/tests/common.py b/contrib/python-zstandard/tests/common.py
--- a/contrib/python-zstandard/tests/common.py
+++ b/contrib/python-zstandard/tests/common.py
@@ -79,12 +79,37 @@
     return cls
 
 
-class OpCountingBytesIO(io.BytesIO):
+class NonClosingBytesIO(io.BytesIO):
+    """BytesIO that saves the underlying buffer on close().
+
+    This allows us to access written data after close().
+    """
     def __init__(self, *args, **kwargs):
+        super(NonClosingBytesIO, self).__init__(*args, **kwargs)
+        self._saved_buffer = None
+
+    def close(self):
+        self._saved_buffer = self.getvalue()
+        return super(NonClosingBytesIO, self).close()
+
+    def getvalue(self):
+        if self.closed:
+            return self._saved_buffer
+        else:
+            return super(NonClosingBytesIO, self).getvalue()
+
+
+class OpCountingBytesIO(NonClosingBytesIO):
+    def __init__(self, *args, **kwargs):
+        self._flush_count = 0
         self._read_count = 0
         self._write_count = 0
         return super(OpCountingBytesIO, self).__init__(*args, **kwargs)
 
+    def flush(self):
+        self._flush_count += 1
+        return super(OpCountingBytesIO, self).flush()
+
     def read(self, *args):
         self._read_count += 1
         return super(OpCountingBytesIO, self).read(*args)
@@ -117,6 +142,13 @@
             except OSError:
                 pass
 
+    # Also add some actual random data.
+    _source_files.append(os.urandom(100))
+    _source_files.append(os.urandom(1000))
+    _source_files.append(os.urandom(10000))
+    _source_files.append(os.urandom(100000))
+    _source_files.append(os.urandom(1000000))
+
     return _source_files
 
 
@@ -140,12 +172,14 @@
 
 
 if hypothesis:
-    default_settings = hypothesis.settings()
+    default_settings = hypothesis.settings(deadline=10000)
     hypothesis.settings.register_profile('default', default_settings)
 
-    ci_settings = hypothesis.settings(max_examples=2500,
-                                      max_iterations=2500)
+    ci_settings = hypothesis.settings(deadline=20000, max_examples=1000)
     hypothesis.settings.register_profile('ci', ci_settings)
 
+    expensive_settings = hypothesis.settings(deadline=None, max_examples=10000)
+    hypothesis.settings.register_profile('expensive', expensive_settings)
+
     hypothesis.settings.load_profile(
         os.environ.get('HYPOTHESIS_PROFILE', 'default'))
diff --git a/contrib/python-zstandard/tests/test_buffer_util.py b/contrib/python-zstandard/tests/test_buffer_util.py
--- a/contrib/python-zstandard/tests/test_buffer_util.py
+++ b/contrib/python-zstandard/tests/test_buffer_util.py
@@ -8,6 +8,9 @@
 
 class TestBufferWithSegments(unittest.TestCase):
     def test_arguments(self):
+        if not hasattr(zstd, 'BufferWithSegments'):
+            self.skipTest('BufferWithSegments not available')
+
         with self.assertRaises(TypeError):
             zstd.BufferWithSegments()
 
@@ -19,10 +22,16 @@
             zstd.BufferWithSegments(b'foo', b'\x00\x00')
 
     def test_invalid_offset(self):
+        if not hasattr(zstd, 'BufferWithSegments'):
+            self.skipTest('BufferWithSegments not available')
+
         with self.assertRaisesRegexp(ValueError, 'offset within segments array references memory'):
             zstd.BufferWithSegments(b'foo', ss.pack(0, 4))
 
     def test_invalid_getitem(self):
+        if not hasattr(zstd, 'BufferWithSegments'):
+            self.skipTest('BufferWithSegments not available')
+
         b = zstd.BufferWithSegments(b'foo', ss.pack(0, 3))
 
         with self.assertRaisesRegexp(IndexError, 'offset must be non-negative'):
@@ -35,6 +44,9 @@
             test = b[2]
 
     def test_single(self):
+        if not hasattr(zstd, 'BufferWithSegments'):
+            self.skipTest('BufferWithSegments not available')
+
         b = zstd.BufferWithSegments(b'foo', ss.pack(0, 3))
         self.assertEqual(len(b), 1)
         self.assertEqual(b.size, 3)
@@ -45,6 +57,9 @@
         self.assertEqual(b[0].tobytes(), b'foo')
 
     def test_multiple(self):
+        if not hasattr(zstd, 'BufferWithSegments'):
+            self.skipTest('BufferWithSegments not available')
+
         b = zstd.BufferWithSegments(b'foofooxfooxy', b''.join([ss.pack(0, 3),
                                                                ss.pack(3, 4),
                                                                ss.pack(7, 5)]))
@@ -59,10 +74,16 @@
 
 class TestBufferWithSegmentsCollection(unittest.TestCase):
     def test_empty_constructor(self):
+        if not hasattr(zstd, 'BufferWithSegmentsCollection'):
+            self.skipTest('BufferWithSegmentsCollection not available')
+
         with self.assertRaisesRegexp(ValueError, 'must pass at least 1 argument'):
             zstd.BufferWithSegmentsCollection()
 
     def test_argument_validation(self):
+        if not hasattr(zstd, 'BufferWithSegmentsCollection'):
+            self.skipTest('BufferWithSegmentsCollection not available')
+
         with self.assertRaisesRegexp(TypeError, 'arguments must be BufferWithSegments'):
             zstd.BufferWithSegmentsCollection(None)
 
@@ -74,6 +95,9 @@
             zstd.BufferWithSegmentsCollection(zstd.BufferWithSegments(b'', b''))
 
     def test_length(self):
+        if not hasattr(zstd, 'BufferWithSegmentsCollection'):
+            self.skipTest('BufferWithSegmentsCollection not available')
+
         b1 = zstd.BufferWithSegments(b'foo', ss.pack(0, 3))
         b2 = zstd.BufferWithSegments(b'barbaz', b''.join([ss.pack(0, 3),
                                                           ss.pack(3, 3)]))
@@ -91,6 +115,9 @@
         self.assertEqual(c.size(), 9)
 
     def test_getitem(self):
+        if not hasattr(zstd, 'BufferWithSegmentsCollection'):
+            self.skipTest('BufferWithSegmentsCollection not available')
+
         b1 = zstd.BufferWithSegments(b'foo', ss.pack(0, 3))
         b2 = zstd.BufferWithSegments(b'barbaz', b''.join([ss.pack(0, 3),
                                                           ss.pack(3, 3)]))
diff --git a/contrib/python-zstandard/tests/test_compressor.py b/contrib/python-zstandard/tests/test_compressor.py
--- a/contrib/python-zstandard/tests/test_compressor.py
+++ b/contrib/python-zstandard/tests/test_compressor.py
@@ -1,14 +1,17 @@
 import hashlib
 import io
+import os
 import struct
 import sys
 import tarfile
+import tempfile
 import unittest
 
 import zstandard as zstd
 
 from .common import (
     make_cffi,
+    NonClosingBytesIO,
     OpCountingBytesIO,
 )
 
@@ -272,7 +275,7 @@
 
         params = zstd.get_frame_parameters(result)
         self.assertEqual(params.content_size, zstd.CONTENTSIZE_UNKNOWN)
-        self.assertEqual(params.window_size, 1048576)
+        self.assertEqual(params.window_size, 2097152)
         self.assertEqual(params.dict_id, 0)
         self.assertFalse(params.has_checksum)
 
@@ -321,7 +324,7 @@
         cobj.compress(b'foo')
         cobj.flush()
 
-        with self.assertRaisesRegexp(zstd.ZstdError, 'cannot call compress\(\) after compressor'):
+        with self.assertRaisesRegexp(zstd.ZstdError, r'cannot call compress\(\) after compressor'):
             cobj.compress(b'foo')
 
         with self.assertRaisesRegexp(zstd.ZstdError, 'compressor object already finished'):
@@ -453,7 +456,7 @@
 
         params = zstd.get_frame_parameters(dest.getvalue())
         self.assertEqual(params.content_size, zstd.CONTENTSIZE_UNKNOWN)
-        self.assertEqual(params.window_size, 1048576)
+        self.assertEqual(params.window_size, 2097152)
         self.assertEqual(params.dict_id, 0)
         self.assertFalse(params.has_checksum)
 
@@ -605,10 +608,6 @@
             with self.assertRaises(io.UnsupportedOperation):
                 reader.readlines()
 
-            # This could probably be implemented someday.
-            with self.assertRaises(NotImplementedError):
-                reader.readall()
-
             with self.assertRaises(io.UnsupportedOperation):
                 iter(reader)
 
@@ -644,15 +643,16 @@
             with self.assertRaisesRegexp(ValueError, 'stream is closed'):
                 reader.read(10)
 
-    def test_read_bad_size(self):
+    def test_read_sizes(self):
         cctx = zstd.ZstdCompressor()
+        foo = cctx.compress(b'foo')
 
         with cctx.stream_reader(b'foo') as reader:
-            with self.assertRaisesRegexp(ValueError, 'cannot read negative or size 0 amounts'):
-                reader.read(-1)
+            with self.assertRaisesRegexp(ValueError, 'cannot read negative amounts less than -1'):
+                reader.read(-2)
 
-            with self.assertRaisesRegexp(ValueError, 'cannot read negative or size 0 amounts'):
-                reader.read(0)
+            self.assertEqual(reader.read(0), b'')
+            self.assertEqual(reader.read(), foo)
 
     def test_read_buffer(self):
         cctx = zstd.ZstdCompressor()
@@ -746,11 +746,202 @@
         with cctx.stream_reader(source, size=42):
             pass
 
+    def test_readall(self):
+        cctx = zstd.ZstdCompressor()
+        frame = cctx.compress(b'foo' * 1024)
+
+        reader = cctx.stream_reader(b'foo' * 1024)
+        self.assertEqual(reader.readall(), frame)
+
+    def test_readinto(self):
+        cctx = zstd.ZstdCompressor()
+        foo = cctx.compress(b'foo')
+
+        reader = cctx.stream_reader(b'foo')
+        with self.assertRaises(Exception):
+            reader.readinto(b'foobar')
+
+        # readinto() with sufficiently large destination.
+        b = bytearray(1024)
+        reader = cctx.stream_reader(b'foo')
+        self.assertEqual(reader.readinto(b), len(foo))
+        self.assertEqual(b[0:len(foo)], foo)
+        self.assertEqual(reader.readinto(b), 0)
+        self.assertEqual(b[0:len(foo)], foo)
+
+        # readinto() with small reads.
+        b = bytearray(1024)
+        reader = cctx.stream_reader(b'foo', read_size=1)
+        self.assertEqual(reader.readinto(b), len(foo))
+        self.assertEqual(b[0:len(foo)], foo)
+
+        # Too small destination buffer.
+        b = bytearray(2)
+        reader = cctx.stream_reader(b'foo')
+        self.assertEqual(reader.readinto(b), 2)
+        self.assertEqual(b[:], foo[0:2])
+        self.assertEqual(reader.readinto(b), 2)
+        self.assertEqual(b[:], foo[2:4])
+        self.assertEqual(reader.readinto(b), 2)
+        self.assertEqual(b[:], foo[4:6])
+
+    def test_readinto1(self):
+        cctx = zstd.ZstdCompressor()
+        foo = b''.join(cctx.read_to_iter(io.BytesIO(b'foo')))
+
+        reader = cctx.stream_reader(b'foo')
+        with self.assertRaises(Exception):
+            reader.readinto1(b'foobar')
+
+        b = bytearray(1024)
+        source = OpCountingBytesIO(b'foo')
+        reader = cctx.stream_reader(source)
+        self.assertEqual(reader.readinto1(b), len(foo))
+        self.assertEqual(b[0:len(foo)], foo)
+        self.assertEqual(source._read_count, 2)
+
+        # readinto1() with small reads.
+        b = bytearray(1024)
+        source = OpCountingBytesIO(b'foo')
+        reader = cctx.stream_reader(source, read_size=1)
+        self.assertEqual(reader.readinto1(b), len(foo))
+        self.assertEqual(b[0:len(foo)], foo)
+        self.assertEqual(source._read_count, 4)
+
+    def test_read1(self):
+        cctx = zstd.ZstdCompressor()
+        foo = b''.join(cctx.read_to_iter(io.BytesIO(b'foo')))
+
+        b = OpCountingBytesIO(b'foo')
+        reader = cctx.stream_reader(b)
+
+        self.assertEqual(reader.read1(), foo)
+        self.assertEqual(b._read_count, 2)
+
+        b = OpCountingBytesIO(b'foo')
+        reader = cctx.stream_reader(b)
+
+        self.assertEqual(reader.read1(0), b'')
+        self.assertEqual(reader.read1(2), foo[0:2])
+        self.assertEqual(b._read_count, 2)
+        self.assertEqual(reader.read1(2), foo[2:4])
+        self.assertEqual(reader.read1(1024), foo[4:])
+
 
 @make_cffi
 class TestCompressor_stream_writer(unittest.TestCase):
+    def test_io_api(self):
+        buffer = io.BytesIO()
+        cctx = zstd.ZstdCompressor()
+        writer = cctx.stream_writer(buffer)
+
+        self.assertFalse(writer.isatty())
+        self.assertFalse(writer.readable())
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readline()
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readline(42)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readline(size=42)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readlines()
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readlines(42)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readlines(hint=42)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.seek(0)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.seek(10, os.SEEK_SET)
+
+        self.assertFalse(writer.seekable())
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.truncate()
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.truncate(42)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.truncate(size=42)
+
+        self.assertTrue(writer.writable())
+
+        with self.assertRaises(NotImplementedError):
+            writer.writelines([])
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.read()
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.read(42)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.read(size=42)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readall()
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readinto(None)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.fileno()
+
+        self.assertFalse(writer.closed)
+
+    def test_fileno_file(self):
+        with tempfile.TemporaryFile('wb') as tf:
+            cctx = zstd.ZstdCompressor()
+            writer = cctx.stream_writer(tf)
+
+            self.assertEqual(writer.fileno(), tf.fileno())
+
+    def test_close(self):
+        buffer = NonClosingBytesIO()
+        cctx = zstd.ZstdCompressor(level=1)
+        writer = cctx.stream_writer(buffer)
+
+        writer.write(b'foo' * 1024)
+        self.assertFalse(writer.closed)
+        self.assertFalse(buffer.closed)
+        writer.close()
+        self.assertTrue(writer.closed)
+        self.assertTrue(buffer.closed)
+
+        with self.assertRaisesRegexp(ValueError, 'stream is closed'):
+            writer.write(b'foo')
+
+        with self.assertRaisesRegexp(ValueError, 'stream is closed'):
+            writer.flush()
+
+        with self.assertRaisesRegexp(ValueError, 'stream is closed'):
+            with writer:
+                pass
+
+        self.assertEqual(buffer.getvalue(),
+                         b'\x28\xb5\x2f\xfd\x00\x48\x55\x00\x00\x18\x66\x6f'
+                         b'\x6f\x01\x00\xfa\xd3\x77\x43')
+
+        # Context manager exit should close stream.
+        buffer = io.BytesIO()
+        writer = cctx.stream_writer(buffer)
+
+        with writer:
+            writer.write(b'foo')
+
+        self.assertTrue(writer.closed)
+
     def test_empty(self):
-        buffer = io.BytesIO()
+        buffer = NonClosingBytesIO()
         cctx = zstd.ZstdCompressor(level=1, write_content_size=False)
         with cctx.stream_writer(buffer) as compressor:
             compressor.write(b'')
@@ -764,6 +955,25 @@
         self.assertEqual(params.dict_id, 0)
         self.assertFalse(params.has_checksum)
 
+        # Test without context manager.
+        buffer = io.BytesIO()
+        compressor = cctx.stream_writer(buffer)
+        self.assertEqual(compressor.write(b''), 0)
+        self.assertEqual(buffer.getvalue(), b'')
+        self.assertEqual(compressor.flush(zstd.FLUSH_FRAME), 9)
+        result = buffer.getvalue()
+        self.assertEqual(result, b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00')
+
+        params = zstd.get_frame_parameters(result)
+        self.assertEqual(params.content_size, zstd.CONTENTSIZE_UNKNOWN)
+        self.assertEqual(params.window_size, 524288)
+        self.assertEqual(params.dict_id, 0)
+        self.assertFalse(params.has_checksum)
+
+        # Test write_return_read=True
+        compressor = cctx.stream_writer(buffer, write_return_read=True)
+        self.assertEqual(compressor.write(b''), 0)
+
     def test_input_types(self):
         expected = b'\x28\xb5\x2f\xfd\x00\x48\x19\x00\x00\x66\x6f\x6f'
         cctx = zstd.ZstdCompressor(level=1)
@@ -778,14 +988,17 @@
         ]
 
         for source in sources:
-            buffer = io.BytesIO()
+            buffer = NonClosingBytesIO()
             with cctx.stream_writer(buffer) as compressor:
                 compressor.write(source)
 
             self.assertEqual(buffer.getvalue(), expected)
 
+            compressor = cctx.stream_writer(buffer, write_return_read=True)
+            self.assertEqual(compressor.write(source), len(source))
+
     def test_multiple_compress(self):
-        buffer = io.BytesIO()
+        buffer = NonClosingBytesIO()
         cctx = zstd.ZstdCompressor(level=5)
         with cctx.stream_writer(buffer) as compressor:
             self.assertEqual(compressor.write(b'foo'), 0)
@@ -794,9 +1007,27 @@
 
         result = buffer.getvalue()
         self.assertEqual(result,
-                         b'\x28\xb5\x2f\xfd\x00\x50\x75\x00\x00\x38\x66\x6f'
+                         b'\x28\xb5\x2f\xfd\x00\x58\x75\x00\x00\x38\x66\x6f'
                          b'\x6f\x62\x61\x72\x78\x01\x00\xfc\xdf\x03\x23')
 
+        # Test without context manager.
+        buffer = io.BytesIO()
+        compressor = cctx.stream_writer(buffer)
+        self.assertEqual(compressor.write(b'foo'), 0)
+        self.assertEqual(compressor.write(b'bar'), 0)
+        self.assertEqual(compressor.write(b'x' * 8192), 0)
+        self.assertEqual(compressor.flush(zstd.FLUSH_FRAME), 23)
+        result = buffer.getvalue()
+        self.assertEqual(result,
+                         b'\x28\xb5\x2f\xfd\x00\x58\x75\x00\x00\x38\x66\x6f'
+                         b'\x6f\x62\x61\x72\x78\x01\x00\xfc\xdf\x03\x23')
+
+        # Test with write_return_read=True.
+        compressor = cctx.stream_writer(buffer, write_return_read=True)
+        self.assertEqual(compressor.write(b'foo'), 3)
+        self.assertEqual(compressor.write(b'barbiz'), 6)
+        self.assertEqual(compressor.write(b'x' * 8192), 8192)
+
     def test_dictionary(self):
         samples = []
         for i in range(128):
@@ -807,9 +1038,9 @@
         d = zstd.train_dictionary(8192, samples)
 
         h = hashlib.sha1(d.as_bytes()).hexdigest()
-        self.assertEqual(h, '2b3b6428da5bf2c9cc9d4bb58ba0bc5990dd0e79')
+        self.assertEqual(h, '88ca0d38332aff379d4ced166a51c280a7679aad')
 
-        buffer = io.BytesIO()
+        buffer = NonClosingBytesIO()
         cctx = zstd.ZstdCompressor(level=9, dict_data=d)
         with cctx.stream_writer(buffer) as compressor:
             self.assertEqual(compressor.write(b'foo'), 0)
@@ -825,7 +1056,7 @@
         self.assertFalse(params.has_checksum)
 
         h = hashlib.sha1(compressed).hexdigest()
-        self.assertEqual(h, '23f88344263678478f5f82298e0a5d1833125786')
+        self.assertEqual(h, '8703b4316f274d26697ea5dd480f29c08e85d940')
 
         source = b'foo' + b'bar' + (b'foo' * 16384)
 
@@ -842,9 +1073,9 @@
             min_match=5,
             search_log=4,
             target_length=10,
-            compression_strategy=zstd.STRATEGY_FAST)
+            strategy=zstd.STRATEGY_FAST)
 
-        buffer = io.BytesIO()
+        buffer = NonClosingBytesIO()
         cctx = zstd.ZstdCompressor(compression_params=params)
         with cctx.stream_writer(buffer) as compressor:
             self.assertEqual(compressor.write(b'foo'), 0)
@@ -863,12 +1094,12 @@
         self.assertEqual(h, '2a8111d72eb5004cdcecbdac37da9f26720d30ef')
 
     def test_write_checksum(self):
-        no_checksum = io.BytesIO()
+        no_checksum = NonClosingBytesIO()
         cctx = zstd.ZstdCompressor(level=1)
         with cctx.stream_writer(no_checksum) as compressor:
             self.assertEqual(compressor.write(b'foobar'), 0)
 
-        with_checksum = io.BytesIO()
+        with_checksum = NonClosingBytesIO()
         cctx = zstd.ZstdCompressor(level=1, write_checksum=True)
         with cctx.stream_writer(with_checksum) as compressor:
             self.assertEqual(compressor.write(b'foobar'), 0)
@@ -886,12 +1117,12 @@
                          len(no_checksum.getvalue()) + 4)
 
     def test_write_content_size(self):
-        no_size = io.BytesIO()
+        no_size = NonClosingBytesIO()
         cctx = zstd.ZstdCompressor(level=1, write_content_size=False)
         with cctx.stream_writer(no_size) as compressor:
             self.assertEqual(compressor.write(b'foobar' * 256), 0)
 
-        with_size = io.BytesIO()
+        with_size = NonClosingBytesIO()
         cctx = zstd.ZstdCompressor(level=1)
         with cctx.stream_writer(with_size) as compressor:
             self.assertEqual(compressor.write(b'foobar' * 256), 0)
@@ -902,7 +1133,7 @@
                          len(no_size.getvalue()))
 
         # Declaring size will write the header.
-        with_size = io.BytesIO()
+        with_size = NonClosingBytesIO()
         with cctx.stream_writer(with_size, size=len(b'foobar' * 256)) as compressor:
             self.assertEqual(compressor.write(b'foobar' * 256), 0)
 
@@ -927,7 +1158,7 @@
 
         d = zstd.train_dictionary(1024, samples)
 
-        with_dict_id = io.BytesIO()
+        with_dict_id = NonClosingBytesIO()
         cctx = zstd.ZstdCompressor(level=1, dict_data=d)
         with cctx.stream_writer(with_dict_id) as compressor:
             self.assertEqual(compressor.write(b'foobarfoobar'), 0)
@@ -935,7 +1166,7 @@
         self.assertEqual(with_dict_id.getvalue()[4:5], b'\x03')
 
         cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_dict_id=False)
-        no_dict_id = io.BytesIO()
+        no_dict_id = NonClosingBytesIO()
         with cctx.stream_writer(no_dict_id) as compressor:
             self.assertEqual(compressor.write(b'foobarfoobar'), 0)
 
@@ -1009,8 +1240,32 @@
         header = trailing[0:3]
         self.assertEqual(header, b'\x01\x00\x00')
 
+    def test_flush_frame(self):
+        cctx = zstd.ZstdCompressor(level=3)
+        dest = OpCountingBytesIO()
+
+        with cctx.stream_writer(dest) as compressor:
+            self.assertEqual(compressor.write(b'foobar' * 8192), 0)
+            self.assertEqual(compressor.flush(zstd.FLUSH_FRAME), 23)
+            compressor.write(b'biz' * 16384)
+
+        self.assertEqual(dest.getvalue(),
+                         # Frame 1.
+                         b'\x28\xb5\x2f\xfd\x00\x58\x75\x00\x00\x30\x66\x6f\x6f'
+                         b'\x62\x61\x72\x01\x00\xf7\xbf\xe8\xa5\x08'
+                         # Frame 2.
+                         b'\x28\xb5\x2f\xfd\x00\x58\x5d\x00\x00\x18\x62\x69\x7a'
+                         b'\x01\x00\xfa\x3f\x75\x37\x04')
+
+    def test_bad_flush_mode(self):
+        cctx = zstd.ZstdCompressor()
+        dest = io.BytesIO()
+        with cctx.stream_writer(dest) as compressor:
+            with self.assertRaisesRegexp(ValueError, 'unknown flush_mode: 42'):
+                compressor.flush(flush_mode=42)
+
     def test_multithreaded(self):
-        dest = io.BytesIO()
+        dest = NonClosingBytesIO()
         cctx = zstd.ZstdCompressor(threads=2)
         with cctx.stream_writer(dest) as compressor:
             compressor.write(b'a' * 1048576)
@@ -1043,22 +1298,21 @@
             pass
 
     def test_tarfile_compat(self):
-        raise unittest.SkipTest('not yet fully working')
-
-        dest = io.BytesIO()
+        dest = NonClosingBytesIO()
         cctx = zstd.ZstdCompressor()
         with cctx.stream_writer(dest) as compressor:
-            with tarfile.open('tf', mode='w', fileobj=compressor) as tf:
+            with tarfile.open('tf', mode='w|', fileobj=compressor) as tf:
                 tf.add(__file__, 'test_compressor.py')
 
-        dest.seek(0)
+        dest = io.BytesIO(dest.getvalue())
 
         dctx = zstd.ZstdDecompressor()
         with dctx.stream_reader(dest) as reader:
-            with tarfile.open(mode='r:', fileobj=reader) as tf:
+            with tarfile.open(mode='r|', fileobj=reader) as tf:
                 for member in tf:
                     self.assertEqual(member.name, 'test_compressor.py')
 
+
 @make_cffi
 class TestCompressor_read_to_iter(unittest.TestCase):
     def test_type_validation(self):
@@ -1192,7 +1446,7 @@
 
         it = chunker.finish()
 
-        self.assertEqual(next(it), b'\x28\xb5\x2f\xfd\x00\x50\x01\x00\x00')
+        self.assertEqual(next(it), b'\x28\xb5\x2f\xfd\x00\x58\x01\x00\x00')
 
         with self.assertRaises(StopIteration):
             next(it)
@@ -1214,7 +1468,7 @@
         it = chunker.finish()
 
         self.assertEqual(next(it),
-                         b'\x28\xb5\x2f\xfd\x00\x50\x7d\x00\x00\x48\x66\x6f'
+                         b'\x28\xb5\x2f\xfd\x00\x58\x7d\x00\x00\x48\x66\x6f'
                          b'\x6f\x62\x61\x72\x62\x61\x7a\x01\x00\xe4\xe4\x8e')
 
         with self.assertRaises(StopIteration):
@@ -1258,7 +1512,7 @@
 
         self.assertEqual(
             b''.join(chunks),
-            b'\x28\xb5\x2f\xfd\x00\x50\x55\x00\x00\x18\x66\x6f\x6f\x01\x00'
+            b'\x28\xb5\x2f\xfd\x00\x58\x55\x00\x00\x18\x66\x6f\x6f\x01\x00'
             b'\xfa\xd3\x77\x43')
 
         dctx = zstd.ZstdDecompressor()
@@ -1283,7 +1537,7 @@
 
             self.assertEqual(list(chunker.compress(source)), [])
             self.assertEqual(list(chunker.finish()), [
-                b'\x28\xb5\x2f\xfd\x00\x50\x19\x00\x00\x66\x6f\x6f'
+                b'\x28\xb5\x2f\xfd\x00\x58\x19\x00\x00\x66\x6f\x6f'
             ])
 
     def test_flush(self):
@@ -1296,7 +1550,7 @@
         chunks1 = list(chunker.flush())
 
         self.assertEqual(chunks1, [
-            b'\x28\xb5\x2f\xfd\x00\x50\x8c\x00\x00\x30\x66\x6f\x6f\x62\x61\x72'
+            b'\x28\xb5\x2f\xfd\x00\x58\x8c\x00\x00\x30\x66\x6f\x6f\x62\x61\x72'
             b'\x02\x00\xfa\x03\xfe\xd0\x9f\xbe\x1b\x02'
         ])
 
@@ -1326,7 +1580,7 @@
 
         with self.assertRaisesRegexp(
                 zstd.ZstdError,
-                'cannot call compress\(\) after compression finished'):
+                r'cannot call compress\(\) after compression finished'):
             list(chunker.compress(b'foo'))
 
     def test_flush_after_finish(self):
@@ -1338,7 +1592,7 @@
 
         with self.assertRaisesRegexp(
                 zstd.ZstdError,
-                'cannot call flush\(\) after compression finished'):
+                r'cannot call flush\(\) after compression finished'):
             list(chunker.flush())
 
     def test_finish_after_finish(self):
@@ -1350,7 +1604,7 @@
 
         with self.assertRaisesRegexp(
                 zstd.ZstdError,
-                'cannot call finish\(\) after compression finished'):
+                r'cannot call finish\(\) after compression finished'):
             list(chunker.finish())
 
 
@@ -1358,6 +1612,9 @@
     def test_invalid_inputs(self):
         cctx = zstd.ZstdCompressor()
 
+        if not hasattr(cctx, 'multi_compress_to_buffer'):
+            self.skipTest('multi_compress_to_buffer not available')
+
         with self.assertRaises(TypeError):
             cctx.multi_compress_to_buffer(True)
 
@@ -1370,6 +1627,9 @@
     def test_empty_input(self):
         cctx = zstd.ZstdCompressor()
 
+        if not hasattr(cctx, 'multi_compress_to_buffer'):
+            self.skipTest('multi_compress_to_buffer not available')
+
         with self.assertRaisesRegexp(ValueError, 'no source elements found'):
             cctx.multi_compress_to_buffer([])
 
@@ -1379,6 +1639,9 @@
     def test_list_input(self):
         cctx = zstd.ZstdCompressor(write_checksum=True)
 
+        if not hasattr(cctx, 'multi_compress_to_buffer'):
+            self.skipTest('multi_compress_to_buffer not available')
+
         original = [b'foo' * 12, b'bar' * 6]
         frames = [cctx.compress(c) for c in original]
         b = cctx.multi_compress_to_buffer(original)
@@ -1394,6 +1657,9 @@
     def test_buffer_with_segments_input(self):
         cctx = zstd.ZstdCompressor(write_checksum=True)
 
+        if not hasattr(cctx, 'multi_compress_to_buffer'):
+            self.skipTest('multi_compress_to_buffer not available')
+
         original = [b'foo' * 4, b'bar' * 6]
         frames = [cctx.compress(c) for c in original]
 
@@ -1412,6 +1678,9 @@
     def test_buffer_with_segments_collection_input(self):
         cctx = zstd.ZstdCompressor(write_checksum=True)
 
+        if not hasattr(cctx, 'multi_compress_to_buffer'):
+            self.skipTest('multi_compress_to_buffer not available')
+
         original = [
             b'foo1',
             b'foo2' * 2,
@@ -1449,6 +1718,9 @@
 
         cctx = zstd.ZstdCompressor(write_checksum=True)
 
+        if not hasattr(cctx, 'multi_compress_to_buffer'):
+            self.skipTest('multi_compress_to_buffer not available')
+
         frames = []
         frames.extend(b'x' * 64 for i in range(256))
         frames.extend(b'y' * 64 for i in range(256))
diff --git a/contrib/python-zstandard/tests/test_compressor_fuzzing.py b/contrib/python-zstandard/tests/test_compressor_fuzzing.py
--- a/contrib/python-zstandard/tests/test_compressor_fuzzing.py
+++ b/contrib/python-zstandard/tests/test_compressor_fuzzing.py
@@ -12,6 +12,7 @@
 
 from . common import (
     make_cffi,
+    NonClosingBytesIO,
     random_input_data,
 )
 
@@ -19,6 +20,62 @@
 @unittest.skipUnless('ZSTD_SLOW_TESTS' in os.environ, 'ZSTD_SLOW_TESTS not set')
 @make_cffi
 class TestCompressor_stream_reader_fuzzing(unittest.TestCase):
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      source_read_size=strategies.integers(1, 16384),
+                      read_size=strategies.integers(-1, zstd.COMPRESSION_RECOMMENDED_OUTPUT_SIZE))
+    def test_stream_source_read(self, original, level, source_read_size,
+                                read_size):
+        if read_size == 0:
+            read_size = -1
+
+        refctx = zstd.ZstdCompressor(level=level)
+        ref_frame = refctx.compress(original)
+
+        cctx = zstd.ZstdCompressor(level=level)
+        with cctx.stream_reader(io.BytesIO(original), size=len(original),
+                                read_size=source_read_size) as reader:
+            chunks = []
+            while True:
+                chunk = reader.read(read_size)
+                if not chunk:
+                    break
+
+                chunks.append(chunk)
+
+        self.assertEqual(b''.join(chunks), ref_frame)
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      source_read_size=strategies.integers(1, 16384),
+                      read_size=strategies.integers(-1, zstd.COMPRESSION_RECOMMENDED_OUTPUT_SIZE))
+    def test_buffer_source_read(self, original, level, source_read_size,
+                                read_size):
+        if read_size == 0:
+            read_size = -1
+
+        refctx = zstd.ZstdCompressor(level=level)
+        ref_frame = refctx.compress(original)
+
+        cctx = zstd.ZstdCompressor(level=level)
+        with cctx.stream_reader(original, size=len(original),
+                                read_size=source_read_size) as reader:
+            chunks = []
+            while True:
+                chunk = reader.read(read_size)
+                if not chunk:
+                    break
+
+                chunks.append(chunk)
+
+        self.assertEqual(b''.join(chunks), ref_frame)
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
     @hypothesis.given(original=strategies.sampled_from(random_input_data()),
                       level=strategies.integers(min_value=1, max_value=5),
                       source_read_size=strategies.integers(1, 16384),
@@ -33,15 +90,17 @@
                                 read_size=source_read_size) as reader:
             chunks = []
             while True:
-                read_size = read_sizes.draw(strategies.integers(1, 16384))
+                read_size = read_sizes.draw(strategies.integers(-1, 16384))
                 chunk = reader.read(read_size)
+                if not chunk and read_size:
+                    break
 
-                if not chunk:
-                    break
                 chunks.append(chunk)
 
         self.assertEqual(b''.join(chunks), ref_frame)
 
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
     @hypothesis.given(original=strategies.sampled_from(random_input_data()),
                       level=strategies.integers(min_value=1, max_value=5),
                       source_read_size=strategies.integers(1, 16384),
@@ -57,14 +116,343 @@
                                 read_size=source_read_size) as reader:
             chunks = []
             while True:
+                read_size = read_sizes.draw(strategies.integers(-1, 16384))
+                chunk = reader.read(read_size)
+                if not chunk and read_size:
+                    break
+
+                chunks.append(chunk)
+
+        self.assertEqual(b''.join(chunks), ref_frame)
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      source_read_size=strategies.integers(1, 16384),
+                      read_size=strategies.integers(1, zstd.COMPRESSION_RECOMMENDED_OUTPUT_SIZE))
+    def test_stream_source_readinto(self, original, level,
+                                    source_read_size, read_size):
+        refctx = zstd.ZstdCompressor(level=level)
+        ref_frame = refctx.compress(original)
+
+        cctx = zstd.ZstdCompressor(level=level)
+        with cctx.stream_reader(io.BytesIO(original), size=len(original),
+                                read_size=source_read_size) as reader:
+            chunks = []
+            while True:
+                b = bytearray(read_size)
+                count = reader.readinto(b)
+
+                if not count:
+                    break
+
+                chunks.append(bytes(b[0:count]))
+
+        self.assertEqual(b''.join(chunks), ref_frame)
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      source_read_size=strategies.integers(1, 16384),
+                      read_size=strategies.integers(1, zstd.COMPRESSION_RECOMMENDED_OUTPUT_SIZE))
+    def test_buffer_source_readinto(self, original, level,
+                                    source_read_size, read_size):
+
+        refctx = zstd.ZstdCompressor(level=level)
+        ref_frame = refctx.compress(original)
+
+        cctx = zstd.ZstdCompressor(level=level)
+        with cctx.stream_reader(original, size=len(original),
+                                read_size=source_read_size) as reader:
+            chunks = []
+            while True:
+                b = bytearray(read_size)
+                count = reader.readinto(b)
+
+                if not count:
+                    break
+
+                chunks.append(bytes(b[0:count]))
+
+        self.assertEqual(b''.join(chunks), ref_frame)
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      source_read_size=strategies.integers(1, 16384),
+                      read_sizes=strategies.data())
+    def test_stream_source_readinto_variance(self, original, level,
+                                             source_read_size, read_sizes):
+        refctx = zstd.ZstdCompressor(level=level)
+        ref_frame = refctx.compress(original)
+
+        cctx = zstd.ZstdCompressor(level=level)
+        with cctx.stream_reader(io.BytesIO(original), size=len(original),
+                                read_size=source_read_size) as reader:
+            chunks = []
+            while True:
                 read_size = read_sizes.draw(strategies.integers(1, 16384))
-                chunk = reader.read(read_size)
+                b = bytearray(read_size)
+                count = reader.readinto(b)
+
+                if not count:
+                    break
+
+                chunks.append(bytes(b[0:count]))
+
+        self.assertEqual(b''.join(chunks), ref_frame)
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      source_read_size=strategies.integers(1, 16384),
+                      read_sizes=strategies.data())
+    def test_buffer_source_readinto_variance(self, original, level,
+                                             source_read_size, read_sizes):
+
+        refctx = zstd.ZstdCompressor(level=level)
+        ref_frame = refctx.compress(original)
+
+        cctx = zstd.ZstdCompressor(level=level)
+        with cctx.stream_reader(original, size=len(original),
+                                read_size=source_read_size) as reader:
+            chunks = []
+            while True:
+                read_size = read_sizes.draw(strategies.integers(1, 16384))
+                b = bytearray(read_size)
+                count = reader.readinto(b)
+
+                if not count:
+                    break
+
+                chunks.append(bytes(b[0:count]))
+
+        self.assertEqual(b''.join(chunks), ref_frame)
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      source_read_size=strategies.integers(1, 16384),
+                      read_size=strategies.integers(-1, zstd.COMPRESSION_RECOMMENDED_OUTPUT_SIZE))
+    def test_stream_source_read1(self, original, level, source_read_size,
+                                 read_size):
+        if read_size == 0:
+            read_size = -1
+
+        refctx = zstd.ZstdCompressor(level=level)
+        ref_frame = refctx.compress(original)
+
+        cctx = zstd.ZstdCompressor(level=level)
+        with cctx.stream_reader(io.BytesIO(original), size=len(original),
+                                read_size=source_read_size) as reader:
+            chunks = []
+            while True:
+                chunk = reader.read1(read_size)
                 if not chunk:
                     break
+
                 chunks.append(chunk)
 
         self.assertEqual(b''.join(chunks), ref_frame)
 
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      source_read_size=strategies.integers(1, 16384),
+                      read_size=strategies.integers(-1, zstd.COMPRESSION_RECOMMENDED_OUTPUT_SIZE))
+    def test_buffer_source_read1(self, original, level, source_read_size,
+                                 read_size):
+        if read_size == 0:
+            read_size = -1
+
+        refctx = zstd.ZstdCompressor(level=level)
+        ref_frame = refctx.compress(original)
+
+        cctx = zstd.ZstdCompressor(level=level)
+        with cctx.stream_reader(original, size=len(original),
+                                read_size=source_read_size) as reader:
+            chunks = []
+            while True:
+                chunk = reader.read1(read_size)
+                if not chunk:
+                    break
+
+                chunks.append(chunk)
+
+        self.assertEqual(b''.join(chunks), ref_frame)
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      source_read_size=strategies.integers(1, 16384),
+                      read_sizes=strategies.data())
+    def test_stream_source_read1_variance(self, original, level, source_read_size,
+                                          read_sizes):
+        refctx = zstd.ZstdCompressor(level=level)
+        ref_frame = refctx.compress(original)
+
+        cctx = zstd.ZstdCompressor(level=level)
+        with cctx.stream_reader(io.BytesIO(original), size=len(original),
+                                read_size=source_read_size) as reader:
+            chunks = []
+            while True:
+                read_size = read_sizes.draw(strategies.integers(-1, 16384))
+                chunk = reader.read1(read_size)
+                if not chunk and read_size:
+                    break
+
+                chunks.append(chunk)
+
+        self.assertEqual(b''.join(chunks), ref_frame)
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      source_read_size=strategies.integers(1, 16384),
+                      read_sizes=strategies.data())
+    def test_buffer_source_read1_variance(self, original, level, source_read_size,
+                                          read_sizes):
+
+        refctx = zstd.ZstdCompressor(level=level)
+        ref_frame = refctx.compress(original)
+
+        cctx = zstd.ZstdCompressor(level=level)
+        with cctx.stream_reader(original, size=len(original),
+                                read_size=source_read_size) as reader:
+            chunks = []
+            while True:
+                read_size = read_sizes.draw(strategies.integers(-1, 16384))
+                chunk = reader.read1(read_size)
+                if not chunk and read_size:
+                    break
+
+                chunks.append(chunk)
+
+        self.assertEqual(b''.join(chunks), ref_frame)
+
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      source_read_size=strategies.integers(1, 16384),
+                      read_size=strategies.integers(1, zstd.COMPRESSION_RECOMMENDED_OUTPUT_SIZE))
+    def test_stream_source_readinto1(self, original, level, source_read_size,
+                                     read_size):
+        if read_size == 0:
+            read_size = -1
+
+        refctx = zstd.ZstdCompressor(level=level)
+        ref_frame = refctx.compress(original)
+
+        cctx = zstd.ZstdCompressor(level=level)
+        with cctx.stream_reader(io.BytesIO(original), size=len(original),
+                                read_size=source_read_size) as reader:
+            chunks = []
+            while True:
+                b = bytearray(read_size)
+                count = reader.readinto1(b)
+
+                if not count:
+                    break
+
+                chunks.append(bytes(b[0:count]))
+
+        self.assertEqual(b''.join(chunks), ref_frame)
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      source_read_size=strategies.integers(1, 16384),
+                      read_size=strategies.integers(1, zstd.COMPRESSION_RECOMMENDED_OUTPUT_SIZE))
+    def test_buffer_source_readinto1(self, original, level, source_read_size,
+                                     read_size):
+        if read_size == 0:
+            read_size = -1
+
+        refctx = zstd.ZstdCompressor(level=level)
+        ref_frame = refctx.compress(original)
+
+        cctx = zstd.ZstdCompressor(level=level)
+        with cctx.stream_reader(original, size=len(original),
+                                read_size=source_read_size) as reader:
+            chunks = []
+            while True:
+                b = bytearray(read_size)
+                count = reader.readinto1(b)
+
+                if not count:
+                    break
+
+                chunks.append(bytes(b[0:count]))
+
+        self.assertEqual(b''.join(chunks), ref_frame)
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      source_read_size=strategies.integers(1, 16384),
+                      read_sizes=strategies.data())
+    def test_stream_source_readinto1_variance(self, original, level, source_read_size,
+                                              read_sizes):
+        refctx = zstd.ZstdCompressor(level=level)
+        ref_frame = refctx.compress(original)
+
+        cctx = zstd.ZstdCompressor(level=level)
+        with cctx.stream_reader(io.BytesIO(original), size=len(original),
+                                read_size=source_read_size) as reader:
+            chunks = []
+            while True:
+                read_size = read_sizes.draw(strategies.integers(1, 16384))
+                b = bytearray(read_size)
+                count = reader.readinto1(b)
+
+                if not count:
+                    break
+
+                chunks.append(bytes(b[0:count]))
+
+        self.assertEqual(b''.join(chunks), ref_frame)
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      source_read_size=strategies.integers(1, 16384),
+                      read_sizes=strategies.data())
+    def test_buffer_source_readinto1_variance(self, original, level, source_read_size,
+                                              read_sizes):
+
+        refctx = zstd.ZstdCompressor(level=level)
+        ref_frame = refctx.compress(original)
+
+        cctx = zstd.ZstdCompressor(level=level)
+        with cctx.stream_reader(original, size=len(original),
+                                read_size=source_read_size) as reader:
+            chunks = []
+            while True:
+                read_size = read_sizes.draw(strategies.integers(1, 16384))
+                b = bytearray(read_size)
+                count = reader.readinto1(b)
+
+                if not count:
+                    break
+
+                chunks.append(bytes(b[0:count]))
+
+        self.assertEqual(b''.join(chunks), ref_frame)
+
+
 
 @unittest.skipUnless('ZSTD_SLOW_TESTS' in os.environ, 'ZSTD_SLOW_TESTS not set')
 @make_cffi
@@ -77,7 +465,7 @@
         ref_frame = refctx.compress(original)
 
         cctx = zstd.ZstdCompressor(level=level)
-        b = io.BytesIO()
+        b = NonClosingBytesIO()
         with cctx.stream_writer(b, size=len(original), write_size=write_size) as compressor:
             compressor.write(original)
 
@@ -219,6 +607,9 @@
                                    write_checksum=True,
                                    **kwargs)
 
+        if not hasattr(cctx, 'multi_compress_to_buffer'):
+            self.skipTest('multi_compress_to_buffer not available')
+
         result = cctx.multi_compress_to_buffer(original, threads=-1)
 
         self.assertEqual(len(result), len(original))
diff --git a/contrib/python-zstandard/tests/test_data_structures.py b/contrib/python-zstandard/tests/test_data_structures.py
--- a/contrib/python-zstandard/tests/test_data_structures.py
+++ b/contrib/python-zstandard/tests/test_data_structures.py
@@ -15,17 +15,17 @@
                                        chain_log=zstd.CHAINLOG_MIN,
                                        hash_log=zstd.HASHLOG_MIN,
                                        search_log=zstd.SEARCHLOG_MIN,
-                                       min_match=zstd.SEARCHLENGTH_MIN + 1,
+                                       min_match=zstd.MINMATCH_MIN + 1,
                                        target_length=zstd.TARGETLENGTH_MIN,
-                                       compression_strategy=zstd.STRATEGY_FAST)
+                                       strategy=zstd.STRATEGY_FAST)
 
         zstd.ZstdCompressionParameters(window_log=zstd.WINDOWLOG_MAX,
                                        chain_log=zstd.CHAINLOG_MAX,
                                        hash_log=zstd.HASHLOG_MAX,
                                        search_log=zstd.SEARCHLOG_MAX,
-                                       min_match=zstd.SEARCHLENGTH_MAX - 1,
+                                       min_match=zstd.MINMATCH_MAX - 1,
                                        target_length=zstd.TARGETLENGTH_MAX,
-                                       compression_strategy=zstd.STRATEGY_BTULTRA)
+                                       strategy=zstd.STRATEGY_BTULTRA2)
 
     def test_from_level(self):
         p = zstd.ZstdCompressionParameters.from_level(1)
@@ -43,7 +43,7 @@
                                            search_log=4,
                                            min_match=5,
                                            target_length=8,
-                                           compression_strategy=1)
+                                           strategy=1)
         self.assertEqual(p.window_log, 10)
         self.assertEqual(p.chain_log, 6)
         self.assertEqual(p.hash_log, 7)
@@ -59,9 +59,10 @@
         self.assertEqual(p.threads, 4)
 
         p = zstd.ZstdCompressionParameters(threads=2, job_size=1048576,
-                                       overlap_size_log=6)
+                                           overlap_log=6)
         self.assertEqual(p.threads, 2)
         self.assertEqual(p.job_size, 1048576)
+        self.assertEqual(p.overlap_log, 6)
         self.assertEqual(p.overlap_size_log, 6)
 
         p = zstd.ZstdCompressionParameters(compression_level=-1)
@@ -85,8 +86,9 @@
         p = zstd.ZstdCompressionParameters(ldm_bucket_size_log=7)
         self.assertEqual(p.ldm_bucket_size_log, 7)
 
-        p = zstd.ZstdCompressionParameters(ldm_hash_every_log=8)
+        p = zstd.ZstdCompressionParameters(ldm_hash_rate_log=8)
         self.assertEqual(p.ldm_hash_every_log, 8)
+        self.assertEqual(p.ldm_hash_rate_log, 8)
 
     def test_estimated_compression_context_size(self):
         p = zstd.ZstdCompressionParameters(window_log=20,
@@ -95,12 +97,44 @@
                                            search_log=1,
                                            min_match=5,
                                            target_length=16,
-                                           compression_strategy=zstd.STRATEGY_DFAST)
+                                           strategy=zstd.STRATEGY_DFAST)
 
         # 32-bit has slightly different values from 64-bit.
         self.assertAlmostEqual(p.estimated_compression_context_size(), 1294072,
                                delta=250)
 
+    def test_strategy(self):
+        with self.assertRaisesRegexp(ValueError, 'cannot specify both compression_strategy'):
+            zstd.ZstdCompressionParameters(strategy=0, compression_strategy=0)
+
+        p = zstd.ZstdCompressionParameters(strategy=2)
+        self.assertEqual(p.compression_strategy, 2)
+
+        p = zstd.ZstdCompressionParameters(strategy=3)
+        self.assertEqual(p.compression_strategy, 3)
+
+    def test_ldm_hash_rate_log(self):
+        with self.assertRaisesRegexp(ValueError, 'cannot specify both ldm_hash_rate_log'):
+            zstd.ZstdCompressionParameters(ldm_hash_rate_log=8, ldm_hash_every_log=4)
+
+        p = zstd.ZstdCompressionParameters(ldm_hash_rate_log=8)
+        self.assertEqual(p.ldm_hash_every_log, 8)
+
+        p = zstd.ZstdCompressionParameters(ldm_hash_every_log=16)
+        self.assertEqual(p.ldm_hash_every_log, 16)
+
+    def test_overlap_log(self):
+        with self.assertRaisesRegexp(ValueError, 'cannot specify both overlap_log'):
+            zstd.ZstdCompressionParameters(overlap_log=1, overlap_size_log=9)
+
+        p = zstd.ZstdCompressionParameters(overlap_log=2)
+        self.assertEqual(p.overlap_log, 2)
+        self.assertEqual(p.overlap_size_log, 2)
+
+        p = zstd.ZstdCompressionParameters(overlap_size_log=4)
+        self.assertEqual(p.overlap_log, 4)
+        self.assertEqual(p.overlap_size_log, 4)
+
 
 @make_cffi
 class TestFrameParameters(unittest.TestCase):
diff --git a/contrib/python-zstandard/tests/test_data_structures_fuzzing.py b/contrib/python-zstandard/tests/test_data_structures_fuzzing.py
--- a/contrib/python-zstandard/tests/test_data_structures_fuzzing.py
+++ b/contrib/python-zstandard/tests/test_data_structures_fuzzing.py
@@ -24,8 +24,8 @@
                                 max_value=zstd.HASHLOG_MAX)
 s_searchlog = strategies.integers(min_value=zstd.SEARCHLOG_MIN,
                                     max_value=zstd.SEARCHLOG_MAX)
-s_searchlength = strategies.integers(min_value=zstd.SEARCHLENGTH_MIN,
-                                     max_value=zstd.SEARCHLENGTH_MAX)
+s_minmatch = strategies.integers(min_value=zstd.MINMATCH_MIN,
+                                 max_value=zstd.MINMATCH_MAX)
 s_targetlength = strategies.integers(min_value=zstd.TARGETLENGTH_MIN,
                                      max_value=zstd.TARGETLENGTH_MAX)
 s_strategy = strategies.sampled_from((zstd.STRATEGY_FAST,
@@ -35,41 +35,42 @@
                                         zstd.STRATEGY_LAZY2,
                                         zstd.STRATEGY_BTLAZY2,
                                         zstd.STRATEGY_BTOPT,
-                                        zstd.STRATEGY_BTULTRA))
+                                        zstd.STRATEGY_BTULTRA,
+                                        zstd.STRATEGY_BTULTRA2))
 
 
 @make_cffi
 @unittest.skipUnless('ZSTD_SLOW_TESTS' in os.environ, 'ZSTD_SLOW_TESTS not set')
 class TestCompressionParametersHypothesis(unittest.TestCase):
     @hypothesis.given(s_windowlog, s_chainlog, s_hashlog, s_searchlog,
-                        s_searchlength, s_targetlength, s_strategy)
+                        s_minmatch, s_targetlength, s_strategy)
     def test_valid_init(self, windowlog, chainlog, hashlog, searchlog,
-                        searchlength, targetlength, strategy):
+                        minmatch, targetlength, strategy):
         zstd.ZstdCompressionParameters(window_log=windowlog,
                                        chain_log=chainlog,
                                        hash_log=hashlog,
                                        search_log=searchlog,
-                                       min_match=searchlength,
+                                       min_match=minmatch,
                                        target_length=targetlength,
-                                       compression_strategy=strategy)
+                                       strategy=strategy)
 
     @hypothesis.given(s_windowlog, s_chainlog, s_hashlog, s_searchlog,
-                        s_searchlength, s_targetlength, s_strategy)
+                      s_minmatch, s_targetlength, s_strategy)
     def test_estimated_compression_context_size(self, windowlog, chainlog,
                                                 hashlog, searchlog,
-                                                searchlength, targetlength,
+                                                minmatch, targetlength,
                                                 strategy):
-        if searchlength == zstd.SEARCHLENGTH_MIN and strategy in (zstd.STRATEGY_FAST, zstd.STRATEGY_GREEDY):
-            searchlength += 1
-        elif searchlength == zstd.SEARCHLENGTH_MAX and strategy != zstd.STRATEGY_FAST:
-            searchlength -= 1
+        if minmatch == zstd.MINMATCH_MIN and strategy in (zstd.STRATEGY_FAST, zstd.STRATEGY_GREEDY):
+            minmatch += 1
+        elif minmatch == zstd.MINMATCH_MAX and strategy != zstd.STRATEGY_FAST:
+            minmatch -= 1
 
         p = zstd.ZstdCompressionParameters(window_log=windowlog,
                                            chain_log=chainlog,
                                            hash_log=hashlog,
                                            search_log=searchlog,
-                                           min_match=searchlength,
+                                           min_match=minmatch,
                                            target_length=targetlength,
-                                           compression_strategy=strategy)
+                                           strategy=strategy)
         size = p.estimated_compression_context_size()
 
diff --git a/contrib/python-zstandard/tests/test_decompressor.py b/contrib/python-zstandard/tests/test_decompressor.py
--- a/contrib/python-zstandard/tests/test_decompressor.py
+++ b/contrib/python-zstandard/tests/test_decompressor.py
@@ -3,6 +3,7 @@
 import random
 import struct
 import sys
+import tempfile
 import unittest
 
 import zstandard as zstd
@@ -10,6 +11,7 @@
 from .common import (
     generate_samples,
     make_cffi,
+    NonClosingBytesIO,
     OpCountingBytesIO,
 )
 
@@ -219,7 +221,7 @@
         cctx = zstd.ZstdCompressor(write_content_size=False)
         frame = cctx.compress(source)
 
-        dctx = zstd.ZstdDecompressor(max_window_size=1)
+        dctx = zstd.ZstdDecompressor(max_window_size=2**zstd.WINDOWLOG_MIN)
 
         with self.assertRaisesRegexp(
             zstd.ZstdError, 'decompression error: Frame requires too much memory'):
@@ -302,19 +304,16 @@
         dctx = zstd.ZstdDecompressor()
 
         with dctx.stream_reader(b'foo') as reader:
-            with self.assertRaises(NotImplementedError):
+            with self.assertRaises(io.UnsupportedOperation):
                 reader.readline()
 
-            with self.assertRaises(NotImplementedError):
+            with self.assertRaises(io.UnsupportedOperation):
                 reader.readlines()
 
-            with self.assertRaises(NotImplementedError):
-                reader.readall()
-
-            with self.assertRaises(NotImplementedError):
+            with self.assertRaises(io.UnsupportedOperation):
                 iter(reader)
 
-            with self.assertRaises(NotImplementedError):
+            with self.assertRaises(io.UnsupportedOperation):
                 next(reader)
 
             with self.assertRaises(io.UnsupportedOperation):
@@ -347,15 +346,18 @@
             with self.assertRaisesRegexp(ValueError, 'stream is closed'):
                 reader.read(1)
 
-    def test_bad_read_size(self):
+    def test_read_sizes(self):
+        cctx = zstd.ZstdCompressor()
+        foo = cctx.compress(b'foo')
+
         dctx = zstd.ZstdDecompressor()
 
-        with dctx.stream_reader(b'foo') as reader:
-            with self.assertRaisesRegexp(ValueError, 'cannot read negative or size 0 amounts'):
-                reader.read(-1)
+        with dctx.stream_reader(foo) as reader:
+            with self.assertRaisesRegexp(ValueError, 'cannot read negative amounts less than -1'):
+                reader.read(-2)
 
-            with self.assertRaisesRegexp(ValueError, 'cannot read negative or size 0 amounts'):
-                reader.read(0)
+            self.assertEqual(reader.read(0), b'')
+            self.assertEqual(reader.read(), b'foo')
 
     def test_read_buffer(self):
         cctx = zstd.ZstdCompressor()
@@ -524,13 +526,243 @@
         reader = dctx.stream_reader(source)
 
         with reader:
-            with self.assertRaises(TypeError):
-                reader.read()
+            reader.read(0)
 
         with reader:
             with self.assertRaisesRegexp(ValueError, 'stream is closed'):
                 reader.read(100)
 
+    def test_partial_read(self):
+        # Inspired by https://github.com/indygreg/python-zstandard/issues/71.
+        buffer = io.BytesIO()
+        cctx = zstd.ZstdCompressor()
+        writer = cctx.stream_writer(buffer)
+        writer.write(bytearray(os.urandom(1000000)))
+        writer.flush(zstd.FLUSH_FRAME)
+        buffer.seek(0)
+
+        dctx = zstd.ZstdDecompressor()
+        reader = dctx.stream_reader(buffer)
+
+        while True:
+            chunk = reader.read(8192)
+            if not chunk:
+                break
+
+    def test_read_multiple_frames(self):
+        cctx = zstd.ZstdCompressor()
+        source = io.BytesIO()
+        writer = cctx.stream_writer(source)
+        writer.write(b'foo')
+        writer.flush(zstd.FLUSH_FRAME)
+        writer.write(b'bar')
+        writer.flush(zstd.FLUSH_FRAME)
+
+        dctx = zstd.ZstdDecompressor()
+
+        reader = dctx.stream_reader(source.getvalue())
+        self.assertEqual(reader.read(2), b'fo')
+        self.assertEqual(reader.read(2), b'o')
+        self.assertEqual(reader.read(2), b'ba')
+        self.assertEqual(reader.read(2), b'r')
+
+        source.seek(0)
+        reader = dctx.stream_reader(source)
+        self.assertEqual(reader.read(2), b'fo')
+        self.assertEqual(reader.read(2), b'o')
+        self.assertEqual(reader.read(2), b'ba')
+        self.assertEqual(reader.read(2), b'r')
+
+        reader = dctx.stream_reader(source.getvalue())
+        self.assertEqual(reader.read(3), b'foo')
+        self.assertEqual(reader.read(3), b'bar')
+
+        source.seek(0)
+        reader = dctx.stream_reader(source)
+        self.assertEqual(reader.read(3), b'foo')
+        self.assertEqual(reader.read(3), b'bar')
+
+        reader = dctx.stream_reader(source.getvalue())
+        self.assertEqual(reader.read(4), b'foo')
+        self.assertEqual(reader.read(4), b'bar')
+
+        source.seek(0)
+        reader = dctx.stream_reader(source)
+        self.assertEqual(reader.read(4), b'foo')
+        self.assertEqual(reader.read(4), b'bar')
+
+        reader = dctx.stream_reader(source.getvalue())
+        self.assertEqual(reader.read(128), b'foo')
+        self.assertEqual(reader.read(128), b'bar')
+
+        source.seek(0)
+        reader = dctx.stream_reader(source)
+        self.assertEqual(reader.read(128), b'foo')
+        self.assertEqual(reader.read(128), b'bar')
+
+        # Now tests for reads spanning frames.
+        reader = dctx.stream_reader(source.getvalue(), read_across_frames=True)
+        self.assertEqual(reader.read(3), b'foo')
+        self.assertEqual(reader.read(3), b'bar')
+
+        source.seek(0)
+        reader = dctx.stream_reader(source, read_across_frames=True)
+        self.assertEqual(reader.read(3), b'foo')
+        self.assertEqual(reader.read(3), b'bar')
+
+        reader = dctx.stream_reader(source.getvalue(), read_across_frames=True)
+        self.assertEqual(reader.read(6), b'foobar')
+
+        source.seek(0)
+        reader = dctx.stream_reader(source, read_across_frames=True)
+        self.assertEqual(reader.read(6), b'foobar')
+
+        reader = dctx.stream_reader(source.getvalue(), read_across_frames=True)
+        self.assertEqual(reader.read(7), b'foobar')
+
+        source.seek(0)
+        reader = dctx.stream_reader(source, read_across_frames=True)
+        self.assertEqual(reader.read(7), b'foobar')
+
+        reader = dctx.stream_reader(source.getvalue(), read_across_frames=True)
+        self.assertEqual(reader.read(128), b'foobar')
+
+        source.seek(0)
+        reader = dctx.stream_reader(source, read_across_frames=True)
+        self.assertEqual(reader.read(128), b'foobar')
+
+    def test_readinto(self):
+        cctx = zstd.ZstdCompressor()
+        foo = cctx.compress(b'foo')
+
+        dctx = zstd.ZstdDecompressor()
+
+        # Attempting to readinto() a non-writable buffer fails.
+        # The exact exception varies based on the backend.
+        reader = dctx.stream_reader(foo)
+        with self.assertRaises(Exception):
+            reader.readinto(b'foobar')
+
+        # readinto() with sufficiently large destination.
+        b = bytearray(1024)
+        reader = dctx.stream_reader(foo)
+        self.assertEqual(reader.readinto(b), 3)
+        self.assertEqual(b[0:3], b'foo')
+        self.assertEqual(reader.readinto(b), 0)
+        self.assertEqual(b[0:3], b'foo')
+
+        # readinto() with small reads.
+        b = bytearray(1024)
+        reader = dctx.stream_reader(foo, read_size=1)
+        self.assertEqual(reader.readinto(b), 3)
+        self.assertEqual(b[0:3], b'foo')
+
+        # Too small destination buffer.
+        b = bytearray(2)
+        reader = dctx.stream_reader(foo)
+        self.assertEqual(reader.readinto(b), 2)
+        self.assertEqual(b[:], b'fo')
+
+    def test_readinto1(self):
+        cctx = zstd.ZstdCompressor()
+        foo = cctx.compress(b'foo')
+
+        dctx = zstd.ZstdDecompressor()
+
+        reader = dctx.stream_reader(foo)
+        with self.assertRaises(Exception):
+            reader.readinto1(b'foobar')
+
+        # Sufficiently large destination.
+        b = bytearray(1024)
+        reader = dctx.stream_reader(foo)
+        self.assertEqual(reader.readinto1(b), 3)
+        self.assertEqual(b[0:3], b'foo')
+        self.assertEqual(reader.readinto1(b), 0)
+        self.assertEqual(b[0:3], b'foo')
+
+        # readinto() with small reads.
+        b = bytearray(1024)
+        reader = dctx.stream_reader(foo, read_size=1)
+        self.assertEqual(reader.readinto1(b), 3)
+        self.assertEqual(b[0:3], b'foo')
+
+        # Too small destination buffer.
+        b = bytearray(2)
+        reader = dctx.stream_reader(foo)
+        self.assertEqual(reader.readinto1(b), 2)
+        self.assertEqual(b[:], b'fo')
+
+    def test_readall(self):
+        cctx = zstd.ZstdCompressor()
+        foo = cctx.compress(b'foo')
+
+        dctx = zstd.ZstdDecompressor()
+        reader = dctx.stream_reader(foo)
+
+        self.assertEqual(reader.readall(), b'foo')
+
+    def test_read1(self):
+        cctx = zstd.ZstdCompressor()
+        foo = cctx.compress(b'foo')
+
+        dctx = zstd.ZstdDecompressor()
+
+        b = OpCountingBytesIO(foo)
+        reader = dctx.stream_reader(b)
+
+        self.assertEqual(reader.read1(), b'foo')
+        self.assertEqual(b._read_count, 1)
+
+        b = OpCountingBytesIO(foo)
+        reader = dctx.stream_reader(b)
+
+        self.assertEqual(reader.read1(0), b'')
+        self.assertEqual(reader.read1(2), b'fo')
+        self.assertEqual(b._read_count, 1)
+        self.assertEqual(reader.read1(1), b'o')
+        self.assertEqual(b._read_count, 1)
+        self.assertEqual(reader.read1(1), b'')
+        self.assertEqual(b._read_count, 2)
+
+    def test_read_lines(self):
+        cctx = zstd.ZstdCompressor()
+        source = b'\n'.join(('line %d' % i).encode('ascii') for i in range(1024))
+
+        frame = cctx.compress(source)
+
+        dctx = zstd.ZstdDecompressor()
+        reader = dctx.stream_reader(frame)
+        tr = io.TextIOWrapper(reader, encoding='utf-8')
+
+        lines = []
+        for line in tr:
+            lines.append(line.encode('utf-8'))
+
+        self.assertEqual(len(lines), 1024)
+        self.assertEqual(b''.join(lines), source)
+
+        reader = dctx.stream_reader(frame)
+        tr = io.TextIOWrapper(reader, encoding='utf-8')
+
+        lines = tr.readlines()
+        self.assertEqual(len(lines), 1024)
+        self.assertEqual(''.join(lines).encode('utf-8'), source)
+
+        reader = dctx.stream_reader(frame)
+        tr = io.TextIOWrapper(reader, encoding='utf-8')
+
+        lines = []
+        while True:
+            line = tr.readline()
+            if not line:
+                break
+
+            lines.append(line.encode('utf-8'))
+
+        self.assertEqual(len(lines), 1024)
+        self.assertEqual(b''.join(lines), source)
+
 
 @make_cffi
 class TestDecompressor_decompressobj(unittest.TestCase):
@@ -540,6 +772,9 @@
         dctx = zstd.ZstdDecompressor()
         dobj = dctx.decompressobj()
         self.assertEqual(dobj.decompress(data), b'foobar')
+        self.assertIsNone(dobj.flush())
+        self.assertIsNone(dobj.flush(10))
+        self.assertIsNone(dobj.flush(length=100))
 
     def test_input_types(self):
         compressed = zstd.ZstdCompressor(level=1).compress(b'foo')
@@ -557,7 +792,11 @@
 
         for source in sources:
             dobj = dctx.decompressobj()
+            self.assertIsNone(dobj.flush())
+            self.assertIsNone(dobj.flush(10))
+            self.assertIsNone(dobj.flush(length=100))
             self.assertEqual(dobj.decompress(source), b'foo')
+            self.assertIsNone(dobj.flush())
 
     def test_reuse(self):
         data = zstd.ZstdCompressor(level=1).compress(b'foobar')
@@ -568,6 +807,7 @@
 
         with self.assertRaisesRegexp(zstd.ZstdError, 'cannot use a decompressobj'):
             dobj.decompress(data)
+            self.assertIsNone(dobj.flush())
 
     def test_bad_write_size(self):
         dctx = zstd.ZstdDecompressor()
@@ -585,16 +825,141 @@
             dobj = dctx.decompressobj(write_size=i + 1)
             self.assertEqual(dobj.decompress(data), source)
 
+
 def decompress_via_writer(data):
     buffer = io.BytesIO()
     dctx = zstd.ZstdDecompressor()
-    with dctx.stream_writer(buffer) as decompressor:
-        decompressor.write(data)
+    decompressor = dctx.stream_writer(buffer)
+    decompressor.write(data)
+
     return buffer.getvalue()
 
 
 @make_cffi
 class TestDecompressor_stream_writer(unittest.TestCase):
+    def test_io_api(self):
+        buffer = io.BytesIO()
+        dctx = zstd.ZstdDecompressor()
+        writer = dctx.stream_writer(buffer)
+
+        self.assertFalse(writer.closed)
+        self.assertFalse(writer.isatty())
+        self.assertFalse(writer.readable())
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readline()
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readline(42)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readline(size=42)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readlines()
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readlines(42)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readlines(hint=42)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.seek(0)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.seek(10, os.SEEK_SET)
+
+        self.assertFalse(writer.seekable())
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.tell()
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.truncate()
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.truncate(42)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.truncate(size=42)
+
+        self.assertTrue(writer.writable())
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.writelines([])
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.read()
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.read(42)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.read(size=42)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readall()
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.readinto(None)
+
+        with self.assertRaises(io.UnsupportedOperation):
+            writer.fileno()
+
+    def test_fileno_file(self):
+        with tempfile.TemporaryFile('wb') as tf:
+            dctx = zstd.ZstdDecompressor()
+            writer = dctx.stream_writer(tf)
+
+            self.assertEqual(writer.fileno(), tf.fileno())
+
+    def test_close(self):
+        foo = zstd.ZstdCompressor().compress(b'foo')
+
+        buffer = NonClosingBytesIO()
+        dctx = zstd.ZstdDecompressor()
+        writer = dctx.stream_writer(buffer)
+
+        writer.write(foo)
+        self.assertFalse(writer.closed)
+        self.assertFalse(buffer.closed)
+        writer.close()
+        self.assertTrue(writer.closed)
+        self.assertTrue(buffer.closed)
+
+        with self.assertRaisesRegexp(ValueError, 'stream is closed'):
+            writer.write(b'')
+
+        with self.assertRaisesRegexp(ValueError, 'stream is closed'):
+            writer.flush()
+
+        with self.assertRaisesRegexp(ValueError, 'stream is closed'):
+            with writer:
+                pass
+
+        self.assertEqual(buffer.getvalue(), b'foo')
+
+        # Context manager exit should close stream.
+        buffer = NonClosingBytesIO()
+        writer = dctx.stream_writer(buffer)
+
+        with writer:
+            writer.write(foo)
+
+        self.assertTrue(writer.closed)
+        self.assertEqual(buffer.getvalue(), b'foo')
+
+    def test_flush(self):
+        buffer = OpCountingBytesIO()
+        dctx = zstd.ZstdDecompressor()
+        writer = dctx.stream_writer(buffer)
+
+        writer.flush()
+        self.assertEqual(buffer._flush_count, 1)
+        writer.flush()
+        self.assertEqual(buffer._flush_count, 2)
+
     def test_empty_roundtrip(self):
         cctx = zstd.ZstdCompressor()
         empty = cctx.compress(b'')
@@ -616,9 +981,21 @@
         dctx = zstd.ZstdDecompressor()
         for source in sources:
             buffer = io.BytesIO()
+
+            decompressor = dctx.stream_writer(buffer)
+            decompressor.write(source)
+            self.assertEqual(buffer.getvalue(), b'foo')
+
+            buffer = NonClosingBytesIO()
+
             with dctx.stream_writer(buffer) as decompressor:
-                decompressor.write(source)
+                self.assertEqual(decompressor.write(source), 3)
+
+            self.assertEqual(buffer.getvalue(), b'foo')
 
+            buffer = io.BytesIO()
+            writer = dctx.stream_writer(buffer, write_return_read=True)
+            self.assertEqual(writer.write(source), len(source))
             self.assertEqual(buffer.getvalue(), b'foo')
 
     def test_large_roundtrip(self):
@@ -641,7 +1018,7 @@
         cctx = zstd.ZstdCompressor()
         compressed = cctx.compress(orig)
 
-        buffer = io.BytesIO()
+        buffer = NonClosingBytesIO()
         dctx = zstd.ZstdDecompressor()
         with dctx.stream_writer(buffer) as decompressor:
             pos = 0
@@ -651,6 +1028,17 @@
                 pos += 8192
         self.assertEqual(buffer.getvalue(), orig)
 
+        # Again with write_return_read=True
+        buffer = io.BytesIO()
+        writer = dctx.stream_writer(buffer, write_return_read=True)
+        pos = 0
+        while pos < len(compressed):
+            pos2 = pos + 8192
+            chunk = compressed[pos:pos2]
+            self.assertEqual(writer.write(chunk), len(chunk))
+            pos += 8192
+        self.assertEqual(buffer.getvalue(), orig)
+
     def test_dictionary(self):
         samples = []
         for i in range(128):
@@ -661,7 +1049,7 @@
         d = zstd.train_dictionary(8192, samples)
 
         orig = b'foobar' * 16384
-        buffer = io.BytesIO()
+        buffer = NonClosingBytesIO()
         cctx = zstd.ZstdCompressor(dict_data=d)
         with cctx.stream_writer(buffer) as compressor:
             self.assertEqual(compressor.write(orig), 0)
@@ -670,6 +1058,12 @@
         buffer = io.BytesIO()
 
         dctx = zstd.ZstdDecompressor(dict_data=d)
+        decompressor = dctx.stream_writer(buffer)
+        self.assertEqual(decompressor.write(compressed), len(orig))
+        self.assertEqual(buffer.getvalue(), orig)
+
+        buffer = NonClosingBytesIO()
+
         with dctx.stream_writer(buffer) as decompressor:
             self.assertEqual(decompressor.write(compressed), len(orig))
 
@@ -678,6 +1072,11 @@
     def test_memory_size(self):
         dctx = zstd.ZstdDecompressor()
         buffer = io.BytesIO()
+
+        decompressor = dctx.stream_writer(buffer)
+        size = decompressor.memory_size()
+        self.assertGreater(size, 100000)
+
         with dctx.stream_writer(buffer) as decompressor:
             size = decompressor.memory_size()
 
@@ -810,7 +1209,7 @@
     @unittest.skipUnless('ZSTD_SLOW_TESTS' in os.environ, 'ZSTD_SLOW_TESTS not set')
     def test_large_input(self):
         bytes = list(struct.Struct('>B').pack(i) for i in range(256))
-        compressed = io.BytesIO()
+        compressed = NonClosingBytesIO()
         input_size = 0
         cctx = zstd.ZstdCompressor(level=1)
         with cctx.stream_writer(compressed) as compressor:
@@ -823,7 +1222,7 @@
                 if have_compressed and have_raw:
                     break
 
-        compressed.seek(0)
+        compressed = io.BytesIO(compressed.getvalue())
         self.assertGreater(len(compressed.getvalue()),
                            zstd.DECOMPRESSION_RECOMMENDED_INPUT_SIZE)
 
@@ -861,7 +1260,7 @@
 
         source = io.BytesIO()
 
-        compressed = io.BytesIO()
+        compressed = NonClosingBytesIO()
         with cctx.stream_writer(compressed) as compressor:
             for i in range(256):
                 chunk = b'\0' * 1024
@@ -874,7 +1273,7 @@
                                  max_output_size=len(source.getvalue()))
         self.assertEqual(simple, source.getvalue())
 
-        compressed.seek(0)
+        compressed = io.BytesIO(compressed.getvalue())
         streamed = b''.join(dctx.read_to_iter(compressed))
         self.assertEqual(streamed, source.getvalue())
 
@@ -1001,6 +1400,9 @@
     def test_invalid_inputs(self):
         dctx = zstd.ZstdDecompressor()
 
+        if not hasattr(dctx, 'multi_decompress_to_buffer'):
+            self.skipTest('multi_decompress_to_buffer not available')
+
         with self.assertRaises(TypeError):
             dctx.multi_decompress_to_buffer(True)
 
@@ -1020,6 +1422,10 @@
         frames = [cctx.compress(d) for d in original]
 
         dctx = zstd.ZstdDecompressor()
+
+        if not hasattr(dctx, 'multi_decompress_to_buffer'):
+            self.skipTest('multi_decompress_to_buffer not available')
+
         result = dctx.multi_decompress_to_buffer(frames)
 
         self.assertEqual(len(result), len(frames))
@@ -1041,6 +1447,10 @@
         sizes = struct.pack('=' + 'Q' * len(original), *map(len, original))
 
         dctx = zstd.ZstdDecompressor()
+
+        if not hasattr(dctx, 'multi_decompress_to_buffer'):
+            self.skipTest('multi_decompress_to_buffer not available')
+
         result = dctx.multi_decompress_to_buffer(frames, decompressed_sizes=sizes)
 
         self.assertEqual(len(result), len(frames))
@@ -1057,6 +1467,9 @@
 
         dctx = zstd.ZstdDecompressor()
 
+        if not hasattr(dctx, 'multi_decompress_to_buffer'):
+            self.skipTest('multi_decompress_to_buffer not available')
+
         segments = struct.pack('=QQQQ', 0, len(frames[0]), len(frames[0]), len(frames[1]))
         b = zstd.BufferWithSegments(b''.join(frames), segments)
 
@@ -1074,12 +1487,16 @@
         frames = [cctx.compress(d) for d in original]
         sizes = struct.pack('=' + 'Q' * len(original), *map(len, original))
 
+        dctx = zstd.ZstdDecompressor()
+
+        if not hasattr(dctx, 'multi_decompress_to_buffer'):
+            self.skipTest('multi_decompress_to_buffer not available')
+
         segments = struct.pack('=QQQQQQ', 0, len(frames[0]),
                                len(frames[0]), len(frames[1]),
                                len(frames[0]) + len(frames[1]), len(frames[2]))
         b = zstd.BufferWithSegments(b''.join(frames), segments)
 
-        dctx = zstd.ZstdDecompressor()
         result = dctx.multi_decompress_to_buffer(b, decompressed_sizes=sizes)
 
         self.assertEqual(len(result), len(frames))
@@ -1099,10 +1516,14 @@
             b'foo4' * 6,
         ]
 
+        if not hasattr(cctx, 'multi_compress_to_buffer'):
+            self.skipTest('multi_compress_to_buffer not available')
+
         frames = cctx.multi_compress_to_buffer(original)
 
         # Check round trip.
         dctx = zstd.ZstdDecompressor()
+
         decompressed = dctx.multi_decompress_to_buffer(frames, threads=3)
 
         self.assertEqual(len(decompressed), len(original))
@@ -1138,7 +1559,12 @@
         frames = [cctx.compress(s) for s in generate_samples()]
 
         dctx = zstd.ZstdDecompressor(dict_data=d)
+
+        if not hasattr(dctx, 'multi_decompress_to_buffer'):
+            self.skipTest('multi_decompress_to_buffer not available')
+
         result = dctx.multi_decompress_to_buffer(frames)
+
         self.assertEqual([o.tobytes() for o in result], generate_samples())
 
     def test_multiple_threads(self):
@@ -1149,6 +1575,10 @@
         frames.extend(cctx.compress(b'y' * 64) for i in range(256))
 
         dctx = zstd.ZstdDecompressor()
+
+        if not hasattr(dctx, 'multi_decompress_to_buffer'):
+            self.skipTest('multi_decompress_to_buffer not available')
+
         result = dctx.multi_decompress_to_buffer(frames, threads=-1)
 
         self.assertEqual(len(result), len(frames))
@@ -1164,6 +1594,9 @@
 
         dctx = zstd.ZstdDecompressor()
 
+        if not hasattr(dctx, 'multi_decompress_to_buffer'):
+            self.skipTest('multi_decompress_to_buffer not available')
+
         with self.assertRaisesRegexp(zstd.ZstdError,
                                      'error decompressing item 1: ('
                                      'Corrupted block|'
diff --git a/contrib/python-zstandard/tests/test_decompressor_fuzzing.py b/contrib/python-zstandard/tests/test_decompressor_fuzzing.py
--- a/contrib/python-zstandard/tests/test_decompressor_fuzzing.py
+++ b/contrib/python-zstandard/tests/test_decompressor_fuzzing.py
@@ -12,6 +12,7 @@
 
 from . common import (
     make_cffi,
+    NonClosingBytesIO,
     random_input_data,
 )
 
@@ -23,22 +24,200 @@
         suppress_health_check=[hypothesis.HealthCheck.large_base_example])
     @hypothesis.given(original=strategies.sampled_from(random_input_data()),
                       level=strategies.integers(min_value=1, max_value=5),
-                      source_read_size=strategies.integers(1, 16384),
+                      streaming=strategies.booleans(),
+                      source_read_size=strategies.integers(1, 1048576),
                       read_sizes=strategies.data())
-    def test_stream_source_read_variance(self, original, level, source_read_size,
-                                         read_sizes):
+    def test_stream_source_read_variance(self, original, level, streaming,
+                                         source_read_size, read_sizes):
         cctx = zstd.ZstdCompressor(level=level)
-        frame = cctx.compress(original)
+
+        if streaming:
+            source = io.BytesIO()
+            writer = cctx.stream_writer(source)
+            writer.write(original)
+            writer.flush(zstd.FLUSH_FRAME)
+            source.seek(0)
+        else:
+            frame = cctx.compress(original)
+            source = io.BytesIO(frame)
 
         dctx = zstd.ZstdDecompressor()
-        source = io.BytesIO(frame)
 
         chunks = []
         with dctx.stream_reader(source, read_size=source_read_size) as reader:
             while True:
-                read_size = read_sizes.draw(strategies.integers(1, 16384))
+                read_size = read_sizes.draw(strategies.integers(-1, 131072))
+                chunk = reader.read(read_size)
+                if not chunk and read_size:
+                    break
+
+                chunks.append(chunk)
+
+        self.assertEqual(b''.join(chunks), original)
+
+    # Similar to above except we have a constant read() size.
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      streaming=strategies.booleans(),
+                      source_read_size=strategies.integers(1, 1048576),
+                      read_size=strategies.integers(-1, 131072))
+    def test_stream_source_read_size(self, original, level, streaming,
+                                     source_read_size, read_size):
+        if read_size == 0:
+            read_size = 1
+
+        cctx = zstd.ZstdCompressor(level=level)
+
+        if streaming:
+            source = io.BytesIO()
+            writer = cctx.stream_writer(source)
+            writer.write(original)
+            writer.flush(zstd.FLUSH_FRAME)
+            source.seek(0)
+        else:
+            frame = cctx.compress(original)
+            source = io.BytesIO(frame)
+
+        dctx = zstd.ZstdDecompressor()
+
+        chunks = []
+        reader = dctx.stream_reader(source, read_size=source_read_size)
+        while True:
+            chunk = reader.read(read_size)
+            if not chunk and read_size:
+                break
+
+            chunks.append(chunk)
+
+        self.assertEqual(b''.join(chunks), original)
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      streaming=strategies.booleans(),
+                      source_read_size=strategies.integers(1, 1048576),
+                      read_sizes=strategies.data())
+    def test_buffer_source_read_variance(self, original, level, streaming,
+                                         source_read_size, read_sizes):
+        cctx = zstd.ZstdCompressor(level=level)
+
+        if streaming:
+            source = io.BytesIO()
+            writer = cctx.stream_writer(source)
+            writer.write(original)
+            writer.flush(zstd.FLUSH_FRAME)
+            frame = source.getvalue()
+        else:
+            frame = cctx.compress(original)
+
+        dctx = zstd.ZstdDecompressor()
+        chunks = []
+
+        with dctx.stream_reader(frame, read_size=source_read_size) as reader:
+            while True:
+                read_size = read_sizes.draw(strategies.integers(-1, 131072))
                 chunk = reader.read(read_size)
-                if not chunk:
+                if not chunk and read_size:
+                    break
+
+                chunks.append(chunk)
+
+        self.assertEqual(b''.join(chunks), original)
+
+    # Similar to above except we have a constant read() size.
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      streaming=strategies.booleans(),
+                      source_read_size=strategies.integers(1, 1048576),
+                      read_size=strategies.integers(-1, 131072))
+    def test_buffer_source_constant_read_size(self, original, level, streaming,
+                                              source_read_size, read_size):
+        if read_size == 0:
+            read_size = -1
+
+        cctx = zstd.ZstdCompressor(level=level)
+
+        if streaming:
+            source = io.BytesIO()
+            writer = cctx.stream_writer(source)
+            writer.write(original)
+            writer.flush(zstd.FLUSH_FRAME)
+            frame = source.getvalue()
+        else:
+            frame = cctx.compress(original)
+
+        dctx = zstd.ZstdDecompressor()
+        chunks = []
+
+        reader = dctx.stream_reader(frame, read_size=source_read_size)
+        while True:
+            chunk = reader.read(read_size)
+            if not chunk and read_size:
+                break
+
+            chunks.append(chunk)
+
+        self.assertEqual(b''.join(chunks), original)
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      streaming=strategies.booleans(),
+                      source_read_size=strategies.integers(1, 1048576))
+    def test_stream_source_readall(self, original, level, streaming,
+                                         source_read_size):
+        cctx = zstd.ZstdCompressor(level=level)
+
+        if streaming:
+            source = io.BytesIO()
+            writer = cctx.stream_writer(source)
+            writer.write(original)
+            writer.flush(zstd.FLUSH_FRAME)
+            source.seek(0)
+        else:
+            frame = cctx.compress(original)
+            source = io.BytesIO(frame)
+
+        dctx = zstd.ZstdDecompressor()
+
+        data = dctx.stream_reader(source, read_size=source_read_size).readall()
+        self.assertEqual(data, original)
+
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(original=strategies.sampled_from(random_input_data()),
+                      level=strategies.integers(min_value=1, max_value=5),
+                      streaming=strategies.booleans(),
+                      source_read_size=strategies.integers(1, 1048576),
+                      read_sizes=strategies.data())
+    def test_stream_source_read1_variance(self, original, level, streaming,
+                                          source_read_size, read_sizes):
+        cctx = zstd.ZstdCompressor(level=level)
+
+        if streaming:
+            source = io.BytesIO()
+            writer = cctx.stream_writer(source)
+            writer.write(original)
+            writer.flush(zstd.FLUSH_FRAME)
+            source.seek(0)
+        else:
+            frame = cctx.compress(original)
+            source = io.BytesIO(frame)
+
+        dctx = zstd.ZstdDecompressor()
+
+        chunks = []
+        with dctx.stream_reader(source, read_size=source_read_size) as reader:
+            while True:
+                read_size = read_sizes.draw(strategies.integers(-1, 131072))
+                chunk = reader.read1(read_size)
+                if not chunk and read_size:
                     break
 
                 chunks.append(chunk)
@@ -49,24 +228,36 @@
         suppress_health_check=[hypothesis.HealthCheck.large_base_example])
     @hypothesis.given(original=strategies.sampled_from(random_input_data()),
                       level=strategies.integers(min_value=1, max_value=5),
-                      source_read_size=strategies.integers(1, 16384),
+                      streaming=strategies.booleans(),
+                      source_read_size=strategies.integers(1, 1048576),
                       read_sizes=strategies.data())
-    def test_buffer_source_read_variance(self, original, level, source_read_size,
-                                         read_sizes):
+    def test_stream_source_readinto1_variance(self, original, level, streaming,
+                                          source_read_size, read_sizes):
         cctx = zstd.ZstdCompressor(level=level)
-        frame = cctx.compress(original)
+
+        if streaming:
+            source = io.BytesIO()
+            writer = cctx.stream_writer(source)
+            writer.write(original)
+            writer.flush(zstd.FLUSH_FRAME)
+            source.seek(0)
+        else:
+            frame = cctx.compress(original)
+            source = io.BytesIO(frame)
 
         dctx = zstd.ZstdDecompressor()
+
         chunks = []
-
-        with dctx.stream_reader(frame, read_size=source_read_size) as reader:
+        with dctx.stream_reader(source, read_size=source_read_size) as reader:
             while True:
-                read_size = read_sizes.draw(strategies.integers(1, 16384))
-                chunk = reader.read(read_size)
-                if not chunk:
+                read_size = read_sizes.draw(strategies.integers(1, 131072))
+                b = bytearray(read_size)
+                count = reader.readinto1(b)
+
+                if not count:
                     break
 
-                chunks.append(chunk)
+                chunks.append(bytes(b[0:count]))
 
         self.assertEqual(b''.join(chunks), original)
 
@@ -75,7 +266,7 @@
     @hypothesis.given(
         original=strategies.sampled_from(random_input_data()),
         level=strategies.integers(min_value=1, max_value=5),
-        source_read_size=strategies.integers(1, 16384),
+        source_read_size=strategies.integers(1, 1048576),
         seek_amounts=strategies.data(),
         read_sizes=strategies.data())
     def test_relative_seeks(self, original, level, source_read_size, seek_amounts,
@@ -99,6 +290,46 @@
 
                 self.assertEqual(original[offset:offset + len(chunk)], chunk)
 
+    @hypothesis.settings(
+        suppress_health_check=[hypothesis.HealthCheck.large_base_example])
+    @hypothesis.given(
+        originals=strategies.data(),
+        frame_count=strategies.integers(min_value=2, max_value=10),
+        level=strategies.integers(min_value=1, max_value=5),
+        source_read_size=strategies.integers(1, 1048576),
+        read_sizes=strategies.data())
+    def test_multiple_frames(self, originals, frame_count, level,
+                             source_read_size, read_sizes):
+
+        cctx = zstd.ZstdCompressor(level=level)
+        source = io.BytesIO()
+        buffer = io.BytesIO()
+        writer = cctx.stream_writer(buffer)
+
+        for i in range(frame_count):
+            data = originals.draw(strategies.sampled_from(random_input_data()))
+            source.write(data)
+            writer.write(data)
+            writer.flush(zstd.FLUSH_FRAME)
+
+        dctx = zstd.ZstdDecompressor()
+        buffer.seek(0)
+        reader = dctx.stream_reader(buffer, read_size=source_read_size,
+                                    read_across_frames=True)
+
+        chunks = []
+
+        while True:
+            read_amount = read_sizes.draw(strategies.integers(-1, 16384))
+            chunk = reader.read(read_amount)
+
+            if not chunk and read_amount:
+                break
+
+            chunks.append(chunk)
+
+        self.assertEqual(source.getvalue(), b''.join(chunks))
+
 
 @unittest.skipUnless('ZSTD_SLOW_TESTS' in os.environ, 'ZSTD_SLOW_TESTS not set')
 @make_cffi
@@ -113,7 +344,7 @@
 
         dctx = zstd.ZstdDecompressor()
         source = io.BytesIO(frame)
-        dest = io.BytesIO()
+        dest = NonClosingBytesIO()
 
         with dctx.stream_writer(dest, write_size=write_size) as decompressor:
             while True:
@@ -234,10 +465,12 @@
                                    write_checksum=True,
                                    **kwargs)
 
+        if not hasattr(cctx, 'multi_compress_to_buffer'):
+            self.skipTest('multi_compress_to_buffer not available')
+
         frames_buffer = cctx.multi_compress_to_buffer(original, threads=-1)
 
         dctx = zstd.ZstdDecompressor(**kwargs)
-
         result = dctx.multi_decompress_to_buffer(frames_buffer)
 
         self.assertEqual(len(result), len(original))
diff --git a/contrib/python-zstandard/tests/test_module_attributes.py b/contrib/python-zstandard/tests/test_module_attributes.py
--- a/contrib/python-zstandard/tests/test_module_attributes.py
+++ b/contrib/python-zstandard/tests/test_module_attributes.py
@@ -12,9 +12,9 @@
 @make_cffi
 class TestModuleAttributes(unittest.TestCase):
     def test_version(self):
-        self.assertEqual(zstd.ZSTD_VERSION, (1, 3, 6))
+        self.assertEqual(zstd.ZSTD_VERSION, (1, 3, 8))
 
-        self.assertEqual(zstd.__version__, '0.10.1')
+        self.assertEqual(zstd.__version__, '0.11.0')
 
     def test_constants(self):
         self.assertEqual(zstd.MAX_COMPRESSION_LEVEL, 22)
@@ -29,6 +29,8 @@
             'DECOMPRESSION_RECOMMENDED_INPUT_SIZE',
             'DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE',
             'MAGIC_NUMBER',
+            'FLUSH_BLOCK',
+            'FLUSH_FRAME',
             'BLOCKSIZELOG_MAX',
             'BLOCKSIZE_MAX',
             'WINDOWLOG_MIN',
@@ -38,6 +40,8 @@
             'HASHLOG_MIN',
             'HASHLOG_MAX',
             'HASHLOG3_MAX',
+            'MINMATCH_MIN',
+            'MINMATCH_MAX',
             'SEARCHLOG_MIN',
             'SEARCHLOG_MAX',
             'SEARCHLENGTH_MIN',
@@ -55,6 +59,7 @@
             'STRATEGY_BTLAZY2',
             'STRATEGY_BTOPT',
             'STRATEGY_BTULTRA',
+            'STRATEGY_BTULTRA2',
             'DICT_TYPE_AUTO',
             'DICT_TYPE_RAWCONTENT',
             'DICT_TYPE_FULLDICT',
diff --git a/contrib/python-zstandard/zstandard/__init__.py b/contrib/python-zstandard/zstandard/__init__.py
--- a/contrib/python-zstandard/zstandard/__init__.py
+++ b/contrib/python-zstandard/zstandard/__init__.py
@@ -35,31 +35,31 @@
         from zstd import *
         backend = 'cext'
     elif platform.python_implementation() in ('PyPy',):
-        from zstd_cffi import *
+        from .cffi import *
         backend = 'cffi'
     else:
         try:
             from zstd import *
             backend = 'cext'
         except ImportError:
-            from zstd_cffi import *
+            from .cffi import *
             backend = 'cffi'
 elif _module_policy == 'cffi_fallback':
     try:
         from zstd import *
         backend = 'cext'
     except ImportError:
-        from zstd_cffi import *
+        from .cffi import *
         backend = 'cffi'
 elif _module_policy == 'cext':
     from zstd import *
     backend = 'cext'
 elif _module_policy == 'cffi':
-    from zstd_cffi import *
+    from .cffi import *
     backend = 'cffi'
 else:
     raise ImportError('unknown module import policy: %s; use default, cffi_fallback, '
                       'cext, or cffi' % _module_policy)
 
 # Keep this in sync with python-zstandard.h.
-__version__ = '0.10.1'
+__version__ = '0.11.0'
diff --git a/contrib/python-zstandard/zstd_cffi.py b/contrib/python-zstandard/zstandard/cffi.py
rename from contrib/python-zstandard/zstd_cffi.py
rename to contrib/python-zstandard/zstandard/cffi.py
--- a/contrib/python-zstandard/zstd_cffi.py
+++ b/contrib/python-zstandard/zstandard/cffi.py
@@ -28,6 +28,8 @@
     'train_dictionary',
 
     # Constants.
+    'FLUSH_BLOCK',
+    'FLUSH_FRAME',
     'COMPRESSOBJ_FLUSH_FINISH',
     'COMPRESSOBJ_FLUSH_BLOCK',
     'ZSTD_VERSION',
@@ -49,6 +51,8 @@
     'HASHLOG_MIN',
     'HASHLOG_MAX',
     'HASHLOG3_MAX',
+    'MINMATCH_MIN',
+    'MINMATCH_MAX',
     'SEARCHLOG_MIN',
     'SEARCHLOG_MAX',
     'SEARCHLENGTH_MIN',
@@ -66,6 +70,7 @@
     'STRATEGY_BTLAZY2',
     'STRATEGY_BTOPT',
     'STRATEGY_BTULTRA',
+    'STRATEGY_BTULTRA2',
     'DICT_TYPE_AUTO',
     'DICT_TYPE_RAWCONTENT',
     'DICT_TYPE_FULLDICT',
@@ -114,10 +119,12 @@
 HASHLOG_MIN = lib.ZSTD_HASHLOG_MIN
 HASHLOG_MAX = lib.ZSTD_HASHLOG_MAX
 HASHLOG3_MAX = lib.ZSTD_HASHLOG3_MAX
+MINMATCH_MIN = lib.ZSTD_MINMATCH_MIN
+MINMATCH_MAX = lib.ZSTD_MINMATCH_MAX
 SEARCHLOG_MIN = lib.ZSTD_SEARCHLOG_MIN
 SEARCHLOG_MAX = lib.ZSTD_SEARCHLOG_MAX
-SEARCHLENGTH_MIN = lib.ZSTD_SEARCHLENGTH_MIN
-SEARCHLENGTH_MAX = lib.ZSTD_SEARCHLENGTH_MAX
+SEARCHLENGTH_MIN = lib.ZSTD_MINMATCH_MIN
+SEARCHLENGTH_MAX = lib.ZSTD_MINMATCH_MAX
 TARGETLENGTH_MIN = lib.ZSTD_TARGETLENGTH_MIN
 TARGETLENGTH_MAX = lib.ZSTD_TARGETLENGTH_MAX
 LDM_MINMATCH_MIN = lib.ZSTD_LDM_MINMATCH_MIN
@@ -132,6 +139,7 @@
 STRATEGY_BTLAZY2 = lib.ZSTD_btlazy2
 STRATEGY_BTOPT = lib.ZSTD_btopt
 STRATEGY_BTULTRA = lib.ZSTD_btultra
+STRATEGY_BTULTRA2 = lib.ZSTD_btultra2
 
 DICT_TYPE_AUTO = lib.ZSTD_dct_auto
 DICT_TYPE_RAWCONTENT = lib.ZSTD_dct_rawContent
@@ -140,6 +148,9 @@
 FORMAT_ZSTD1 = lib.ZSTD_f_zstd1
 FORMAT_ZSTD1_MAGICLESS = lib.ZSTD_f_zstd1_magicless
 
+FLUSH_BLOCK = 0
+FLUSH_FRAME = 1
+
 COMPRESSOBJ_FLUSH_FINISH = 0
 COMPRESSOBJ_FLUSH_BLOCK = 1
 
@@ -182,27 +193,27 @@
     res = ffi.gc(res, lib.ZSTD_freeCCtxParams)
 
     attrs = [
-        (lib.ZSTD_p_format, params.format),
-        (lib.ZSTD_p_compressionLevel, params.compression_level),
-        (lib.ZSTD_p_windowLog, params.window_log),
-        (lib.ZSTD_p_hashLog, params.hash_log),
-        (lib.ZSTD_p_chainLog, params.chain_log),
-        (lib.ZSTD_p_searchLog, params.search_log),
-        (lib.ZSTD_p_minMatch, params.min_match),
-        (lib.ZSTD_p_targetLength, params.target_length),
-        (lib.ZSTD_p_compressionStrategy, params.compression_strategy),
-        (lib.ZSTD_p_contentSizeFlag, params.write_content_size),
-        (lib.ZSTD_p_checksumFlag, params.write_checksum),
-        (lib.ZSTD_p_dictIDFlag, params.write_dict_id),
-        (lib.ZSTD_p_nbWorkers, params.threads),
-        (lib.ZSTD_p_jobSize, params.job_size),
-        (lib.ZSTD_p_overlapSizeLog, params.overlap_size_log),
-        (lib.ZSTD_p_forceMaxWindow, params.force_max_window),
-        (lib.ZSTD_p_enableLongDistanceMatching, params.enable_ldm),
-        (lib.ZSTD_p_ldmHashLog, params.ldm_hash_log),
-        (lib.ZSTD_p_ldmMinMatch, params.ldm_min_match),
-        (lib.ZSTD_p_ldmBucketSizeLog, params.ldm_bucket_size_log),
-        (lib.ZSTD_p_ldmHashEveryLog, params.ldm_hash_every_log),
+        (lib.ZSTD_c_format, params.format),
+        (lib.ZSTD_c_compressionLevel, params.compression_level),
+        (lib.ZSTD_c_windowLog, params.window_log),
+        (lib.ZSTD_c_hashLog, params.hash_log),
+        (lib.ZSTD_c_chainLog, params.chain_log),
+        (lib.ZSTD_c_searchLog, params.search_log),
+        (lib.ZSTD_c_minMatch, params.min_match),
+        (lib.ZSTD_c_targetLength, params.target_length),
+        (lib.ZSTD_c_strategy, params.compression_strategy),
+        (lib.ZSTD_c_contentSizeFlag, params.write_content_size),
+        (lib.ZSTD_c_checksumFlag, params.write_checksum),
+        (lib.ZSTD_c_dictIDFlag, params.write_dict_id),
+        (lib.ZSTD_c_nbWorkers, params.threads),
+        (lib.ZSTD_c_jobSize, params.job_size),
+        (lib.ZSTD_c_overlapLog, params.overlap_log),
+        (lib.ZSTD_c_forceMaxWindow, params.force_max_window),
+        (lib.ZSTD_c_enableLongDistanceMatching, params.enable_ldm),
+        (lib.ZSTD_c_ldmHashLog, params.ldm_hash_log),
+        (lib.ZSTD_c_ldmMinMatch, params.ldm_min_match),
+        (lib.ZSTD_c_ldmBucketSizeLog, params.ldm_bucket_size_log),
+        (lib.ZSTD_c_ldmHashRateLog, params.ldm_hash_rate_log),
     ]
 
     for param, value in attrs:
@@ -220,7 +231,7 @@
             'chain_log': 'chainLog',
             'hash_log': 'hashLog',
             'search_log': 'searchLog',
-            'min_match': 'searchLength',
+            'min_match': 'minMatch',
             'target_length': 'targetLength',
             'compression_strategy': 'strategy',
         }
@@ -233,41 +244,170 @@
 
     def __init__(self, format=0, compression_level=0, window_log=0, hash_log=0,
                  chain_log=0, search_log=0, min_match=0, target_length=0,
-                 compression_strategy=0, write_content_size=1, write_checksum=0,
-                 write_dict_id=0, job_size=0, overlap_size_log=0,
-                 force_max_window=0, enable_ldm=0, ldm_hash_log=0,
-                 ldm_min_match=0, ldm_bucket_size_log=0, ldm_hash_every_log=0,
-                 threads=0):
+                 strategy=-1, compression_strategy=-1,
+                 write_content_size=1, write_checksum=0,
+                 write_dict_id=0, job_size=0, overlap_log=-1,
+                 overlap_size_log=-1, force_max_window=0, enable_ldm=0,
+                 ldm_hash_log=0, ldm_min_match=0, ldm_bucket_size_log=0,
+                 ldm_hash_rate_log=-1, ldm_hash_every_log=-1, threads=0):
+
+        params = lib.ZSTD_createCCtxParams()
+        if params == ffi.NULL:
+            raise MemoryError()
+
+        params = ffi.gc(params, lib.ZSTD_freeCCtxParams)
+
+        self._params = params
 
         if threads < 0:
             threads = _cpu_count()
 
-        self.format = format
-        self.compression_level = compression_level
-        self.window_log = window_log
-        self.hash_log = hash_log
-        self.chain_log = chain_log
-        self.search_log = search_log
-        self.min_match = min_match
-        self.target_length = target_length
-        self.compression_strategy = compression_strategy
-        self.write_content_size = write_content_size
-        self.write_checksum = write_checksum
-        self.write_dict_id = write_dict_id
-        self.job_size = job_size
-        self.overlap_size_log = overlap_size_log
-        self.force_max_window = force_max_window
-        self.enable_ldm = enable_ldm
-        self.ldm_hash_log = ldm_hash_log
-        self.ldm_min_match = ldm_min_match
-        self.ldm_bucket_size_log = ldm_bucket_size_log
-        self.ldm_hash_every_log = ldm_hash_every_log
-        self.threads = threads
-
-        self.params = _make_cctx_params(self)
+        # We need to set ZSTD_c_nbWorkers before ZSTD_c_jobSize and ZSTD_c_overlapLog
+        # because setting ZSTD_c_nbWorkers resets the other parameters.
+        _set_compression_parameter(params, lib.ZSTD_c_nbWorkers, threads)
+
+        _set_compression_parameter(params, lib.ZSTD_c_format, format)
+        _set_compression_parameter(params, lib.ZSTD_c_compressionLevel, compression_level)
+        _set_compression_parameter(params, lib.ZSTD_c_windowLog, window_log)
+        _set_compression_parameter(params, lib.ZSTD_c_hashLog, hash_log)
+        _set_compression_parameter(params, lib.ZSTD_c_chainLog, chain_log)
+        _set_compression_parameter(params, lib.ZSTD_c_searchLog, search_log)
+        _set_compression_parameter(params, lib.ZSTD_c_minMatch, min_match)
+        _set_compression_parameter(params, lib.ZSTD_c_targetLength, target_length)
+
+        if strategy != -1 and compression_strategy != -1:
+            raise ValueError('cannot specify both compression_strategy and strategy')
+
+        if compression_strategy != -1:
+            strategy = compression_strategy
+        elif strategy == -1:
+            strategy = 0
+
+        _set_compression_parameter(params, lib.ZSTD_c_strategy, strategy)
+        _set_compression_parameter(params, lib.ZSTD_c_contentSizeFlag, write_content_size)
+        _set_compression_parameter(params, lib.ZSTD_c_checksumFlag, write_checksum)
+        _set_compression_parameter(params, lib.ZSTD_c_dictIDFlag, write_dict_id)
+        _set_compression_parameter(params, lib.ZSTD_c_jobSize, job_size)
+
+        if overlap_log != -1 and overlap_size_log != -1:
+            raise ValueError('cannot specify both overlap_log and overlap_size_log')
+
+        if overlap_size_log != -1:
+            overlap_log = overlap_size_log
+        elif overlap_log == -1:
+            overlap_log = 0
+
+        _set_compression_parameter(params, lib.ZSTD_c_overlapLog, overlap_log)
+        _set_compression_parameter(params, lib.ZSTD_c_forceMaxWindow, force_max_window)
+        _set_compression_parameter(params, lib.ZSTD_c_enableLongDistanceMatching, enable_ldm)
+        _set_compression_parameter(params, lib.ZSTD_c_ldmHashLog, ldm_hash_log)
+        _set_compression_parameter(params, lib.ZSTD_c_ldmMinMatch, ldm_min_match)
+        _set_compression_parameter(params, lib.ZSTD_c_ldmBucketSizeLog, ldm_bucket_size_log)
+
+        if ldm_hash_rate_log != -1 and ldm_hash_every_log != -1:
+            raise ValueError('cannot specify both ldm_hash_rate_log and ldm_hash_every_log')
+
+        if ldm_hash_every_log != -1:
+            ldm_hash_rate_log = ldm_hash_every_log
+        elif ldm_hash_rate_log == -1:
+            ldm_hash_rate_log = 0
+
+        _set_compression_parameter(params, lib.ZSTD_c_ldmHashRateLog, ldm_hash_rate_log)
+
+    @property
+    def format(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_format)
+
+    @property
+    def compression_level(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_compressionLevel)
+
+    @property
+    def window_log(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_windowLog)
+
+    @property
+    def hash_log(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_hashLog)
+
+    @property
+    def chain_log(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_chainLog)
+
+    @property
+    def search_log(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_searchLog)
+
+    @property
+    def min_match(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_minMatch)
+
+    @property
+    def target_length(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_targetLength)
+
+    @property
+    def compression_strategy(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_strategy)
+
+    @property
+    def write_content_size(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_contentSizeFlag)
+
+    @property
+    def write_checksum(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_checksumFlag)
+
+    @property
+    def write_dict_id(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_dictIDFlag)
+
+    @property
+    def job_size(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_jobSize)
+
+    @property
+    def overlap_log(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_overlapLog)
+
+    @property
+    def overlap_size_log(self):
+        return self.overlap_log
+
+    @property
+    def force_max_window(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_forceMaxWindow)
+
+    @property
+    def enable_ldm(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_enableLongDistanceMatching)
+
+    @property
+    def ldm_hash_log(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_ldmHashLog)
+
+    @property
+    def ldm_min_match(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_ldmMinMatch)
+
+    @property
+    def ldm_bucket_size_log(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_ldmBucketSizeLog)
+
+    @property
+    def ldm_hash_rate_log(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_ldmHashRateLog)
+
+    @property
+    def ldm_hash_every_log(self):
+        return self.ldm_hash_rate_log
+
+    @property
+    def threads(self):
+        return _get_compression_parameter(self._params, lib.ZSTD_c_nbWorkers)
 
     def estimated_compression_context_size(self):
-        return lib.ZSTD_estimateCCtxSize_usingCCtxParams(self.params)
+        return lib.ZSTD_estimateCCtxSize_usingCCtxParams(self._params)
 
 CompressionParameters = ZstdCompressionParameters
 
@@ -276,31 +416,53 @@
 
 
 def _set_compression_parameter(params, param, value):
-    zresult = lib.ZSTD_CCtxParam_setParameter(params, param,
-                                              ffi.cast('unsigned', value))
+    zresult = lib.ZSTD_CCtxParam_setParameter(params, param, value)
     if lib.ZSTD_isError(zresult):
         raise ZstdError('unable to set compression context parameter: %s' %
                         _zstd_error(zresult))
 
+
+def _get_compression_parameter(params, param):
+    result = ffi.new('int *')
+
+    zresult = lib.ZSTD_CCtxParam_getParameter(params, param, result)
+    if lib.ZSTD_isError(zresult):
+        raise ZstdError('unable to get compression context parameter: %s' %
+                        _zstd_error(zresult))
+
+    return result[0]
+
+
 class ZstdCompressionWriter(object):
-    def __init__(self, compressor, writer, source_size, write_size):
+    def __init__(self, compressor, writer, source_size, write_size,
+                 write_return_read):
         self._compressor = compressor
         self._writer = writer
-        self._source_size = source_size
         self._write_size = write_size
+        self._write_return_read = bool(write_return_read)
         self._entered = False
+        self._closed = False
         self._bytes_compressed = 0
 
-    def __enter__(self):
-        if self._entered:
-            raise ZstdError('cannot __enter__ multiple times')
-
-        zresult = lib.ZSTD_CCtx_setPledgedSrcSize(self._compressor._cctx,
-                                                  self._source_size)
+        self._dst_buffer = ffi.new('char[]', write_size)
+        self._out_buffer = ffi.new('ZSTD_outBuffer *')
+        self._out_buffer.dst = self._dst_buffer
+        self._out_buffer.size = len(self._dst_buffer)
+        self._out_buffer.pos = 0
+
+        zresult = lib.ZSTD_CCtx_setPledgedSrcSize(compressor._cctx,
+                                                  source_size)
         if lib.ZSTD_isError(zresult):
             raise ZstdError('error setting source size: %s' %
                             _zstd_error(zresult))
 
+    def __enter__(self):
+        if self._closed:
+            raise ValueError('stream is closed')
+
+        if self._entered:
+            raise ZstdError('cannot __enter__ multiple times')
+
         self._entered = True
         return self
 
@@ -308,50 +470,79 @@
         self._entered = False
 
         if not exc_type and not exc_value and not exc_tb:
-            dst_buffer = ffi.new('char[]', self._write_size)
-
-            out_buffer = ffi.new('ZSTD_outBuffer *')
-            in_buffer = ffi.new('ZSTD_inBuffer *')
-
-            out_buffer.dst = dst_buffer
-            out_buffer.size = len(dst_buffer)
-            out_buffer.pos = 0
-
-            in_buffer.src = ffi.NULL
-            in_buffer.size = 0
-            in_buffer.pos = 0
-
-            while True:
-                zresult = lib.ZSTD_compress_generic(self._compressor._cctx,
-                                                    out_buffer, in_buffer,
-                                                    lib.ZSTD_e_end)
-
-                if lib.ZSTD_isError(zresult):
-                    raise ZstdError('error ending compression stream: %s' %
-                                    _zstd_error(zresult))
-
-                if out_buffer.pos:
-                    self._writer.write(ffi.buffer(out_buffer.dst, out_buffer.pos)[:])
-                    out_buffer.pos = 0
-
-                if zresult == 0:
-                    break
+            self.close()
 
         self._compressor = None
 
         return False
 
     def memory_size(self):
-        if not self._entered:
-            raise ZstdError('cannot determine size of an inactive compressor; '
-                            'call when a context manager is active')
-
         return lib.ZSTD_sizeof_CCtx(self._compressor._cctx)
 
+    def fileno(self):
+        f = getattr(self._writer, 'fileno', None)
+        if f:
+            return f()
+        else:
+            raise OSError('fileno not available on underlying writer')
+
+    def close(self):
+        if self._closed:
+            return
+
+        try:
+            self.flush(FLUSH_FRAME)
+        finally:
+            self._closed = True
+
+        # Call close() on underlying stream as well.
+        f = getattr(self._writer, 'close', None)
+        if f:
+            f()
+
+    @property
+    def closed(self):
+        return self._closed
+
+    def isatty(self):
+        return False
+
+    def readable(self):
+        return False
+
+    def readline(self, size=-1):
+        raise io.UnsupportedOperation()
+
+    def readlines(self, hint=-1):
+        raise io.UnsupportedOperation()
+
+    def seek(self, offset, whence=None):
+        raise io.UnsupportedOperation()
+
+    def seekable(self):
+        return False
+
+    def truncate(self, size=None):
+        raise io.UnsupportedOperation()
+
+    def writable(self):
+        return True
+
+    def writelines(self, lines):
+        raise NotImplementedError('writelines() is not yet implemented')
+
+    def read(self, size=-1):
+        raise io.UnsupportedOperation()
+
+    def readall(self):
+        raise io.UnsupportedOperation()
+
+    def readinto(self, b):
+        raise io.UnsupportedOperation()
+
     def write(self, data):
-        if not self._entered:
-            raise ZstdError('write() must be called from an active context '
-                            'manager')
+        if self._closed:
+            raise ValueError('stream is closed')
 
         total_write = 0
 
@@ -362,16 +553,13 @@
         in_buffer.size = len(data_buffer)
         in_buffer.pos = 0
 
-        out_buffer = ffi.new('ZSTD_outBuffer *')
-        dst_buffer = ffi.new('char[]', self._write_size)
-        out_buffer.dst = dst_buffer
-        out_buffer.size = self._write_size
+        out_buffer = self._out_buffer
         out_buffer.pos = 0
 
         while in_buffer.pos < in_buffer.size:
-            zresult = lib.ZSTD_compress_generic(self._compressor._cctx,
-                                                out_buffer, in_buffer,
-                                                lib.ZSTD_e_continue)
+            zresult = lib.ZSTD_compressStream2(self._compressor._cctx,
+                                               out_buffer, in_buffer,
+                                               lib.ZSTD_e_continue)
             if lib.ZSTD_isError(zresult):
                 raise ZstdError('zstd compress error: %s' %
                                 _zstd_error(zresult))
@@ -382,18 +570,25 @@
                 self._bytes_compressed += out_buffer.pos
                 out_buffer.pos = 0
 
-        return total_write
-
-    def flush(self):
-        if not self._entered:
-            raise ZstdError('flush must be called from an active context manager')
+        if self._write_return_read:
+            return in_buffer.pos
+        else:
+            return total_write
+
+    def flush(self, flush_mode=FLUSH_BLOCK):
+        if flush_mode == FLUSH_BLOCK:
+            flush = lib.ZSTD_e_flush
+        elif flush_mode == FLUSH_FRAME:
+            flush = lib.ZSTD_e_end
+        else:
+            raise ValueError('unknown flush_mode: %r' % flush_mode)
+
+        if self._closed:
+            raise ValueError('stream is closed')
 
         total_write = 0
 
-        out_buffer = ffi.new('ZSTD_outBuffer *')
-        dst_buffer = ffi.new('char[]', self._write_size)
-        out_buffer.dst = dst_buffer
-        out_buffer.size = self._write_size
+        out_buffer = self._out_buffer
         out_buffer.pos = 0
 
         in_buffer = ffi.new('ZSTD_inBuffer *')
@@ -402,9 +597,9 @@
         in_buffer.pos = 0
 
         while True:
-            zresult = lib.ZSTD_compress_generic(self._compressor._cctx,
-                                                out_buffer, in_buffer,
-                                                lib.ZSTD_e_flush)
+            zresult = lib.ZSTD_compressStream2(self._compressor._cctx,
+                                               out_buffer, in_buffer,
+                                               flush)
             if lib.ZSTD_isError(zresult):
                 raise ZstdError('zstd compress error: %s' %
                                 _zstd_error(zresult))
@@ -438,10 +633,10 @@
         chunks = []
 
         while source.pos < len(data):
-            zresult = lib.ZSTD_compress_generic(self._compressor._cctx,
-                                                self._out,
-                                                source,
-                                                lib.ZSTD_e_continue)
+            zresult = lib.ZSTD_compressStream2(self._compressor._cctx,
+                                               self._out,
+                                               source,
+                                               lib.ZSTD_e_continue)
             if lib.ZSTD_isError(zresult):
                 raise ZstdError('zstd compress error: %s' %
                                 _zstd_error(zresult))
@@ -477,10 +672,10 @@
         chunks = []
 
         while True:
-            zresult = lib.ZSTD_compress_generic(self._compressor._cctx,
-                                                self._out,
-                                                in_buffer,
-                                                z_flush_mode)
+            zresult = lib.ZSTD_compressStream2(self._compressor._cctx,
+                                               self._out,
+                                               in_buffer,
+                                               z_flush_mode)
             if lib.ZSTD_isError(zresult):
                 raise ZstdError('error ending compression stream: %s' %
                                 _zstd_error(zresult))
@@ -528,10 +723,10 @@
         self._in.pos = 0
 
         while self._in.pos < self._in.size:
-            zresult = lib.ZSTD_compress_generic(self._compressor._cctx,
-                                                self._out,
-                                                self._in,
-                                                lib.ZSTD_e_continue)
+            zresult = lib.ZSTD_compressStream2(self._compressor._cctx,
+                                               self._out,
+                                               self._in,
+                                               lib.ZSTD_e_continue)
 
             if self._in.pos == self._in.size:
                 self._in.src = ffi.NULL
@@ -555,9 +750,9 @@
                             'previous operation')
 
         while True:
-            zresult = lib.ZSTD_compress_generic(self._compressor._cctx,
-                                                self._out, self._in,
-                                                lib.ZSTD_e_flush)
+            zresult = lib.ZSTD_compressStream2(self._compressor._cctx,
+                                               self._out, self._in,
+                                               lib.ZSTD_e_flush)
             if lib.ZSTD_isError(zresult):
                 raise ZstdError('zstd compress error: %s' % _zstd_error(zresult))
 
@@ -577,9 +772,9 @@
                             'previous operation')
 
         while True:
-            zresult = lib.ZSTD_compress_generic(self._compressor._cctx,
-                                                self._out, self._in,
-                                                lib.ZSTD_e_end)
+            zresult = lib.ZSTD_compressStream2(self._compressor._cctx,
+                                               self._out, self._in,
+                                               lib.ZSTD_e_end)
             if lib.ZSTD_isError(zresult):
                 raise ZstdError('zstd compress error: %s' % _zstd_error(zresult))
 
@@ -592,7 +787,7 @@
                 return
 
 
-class CompressionReader(object):
+class ZstdCompressionReader(object):
     def __init__(self, compressor, source, read_size):
         self._compressor = compressor
         self._source = source
@@ -661,7 +856,16 @@
         return self._bytes_compressed
 
     def readall(self):
-        raise NotImplementedError()
+        chunks = []
+
+        while True:
+            chunk = self.read(1048576)
+            if not chunk:
+                break
+
+            chunks.append(chunk)
+
+        return b''.join(chunks)
 
     def __iter__(self):
         raise io.UnsupportedOperation()
@@ -671,16 +875,67 @@
 
     next = __next__
 
+    def _read_input(self):
+        if self._finished_input:
+            return
+
+        if hasattr(self._source, 'read'):
+            data = self._source.read(self._read_size)
+
+            if not data:
+                self._finished_input = True
+                return
+
+            self._source_buffer = ffi.from_buffer(data)
+            self._in_buffer.src = self._source_buffer
+            self._in_buffer.size = len(self._source_buffer)
+            self._in_buffer.pos = 0
+        else:
+            self._source_buffer = ffi.from_buffer(self._source)
+            self._in_buffer.src = self._source_buffer
+            self._in_buffer.size = len(self._source_buffer)
+            self._in_buffer.pos = 0
+
+    def _compress_into_buffer(self, out_buffer):
+        if self._in_buffer.pos >= self._in_buffer.size:
+            return
+
+        old_pos = out_buffer.pos
+
+        zresult = lib.ZSTD_compressStream2(self._compressor._cctx,
+                                           out_buffer, self._in_buffer,
+                                           lib.ZSTD_e_continue)
+
+        self._bytes_compressed += out_buffer.pos - old_pos
+
+        if self._in_buffer.pos == self._in_buffer.size:
+            self._in_buffer.src = ffi.NULL
+            self._in_buffer.pos = 0
+            self._in_buffer.size = 0
+            self._source_buffer = None
+
+            if not hasattr(self._source, 'read'):
+                self._finished_input = True
+
+        if lib.ZSTD_isError(zresult):
+            raise ZstdError('zstd compress error: %s',
+                            _zstd_error(zresult))
+
+        return out_buffer.pos and out_buffer.pos == out_buffer.size
+
     def read(self, size=-1):
         if self._closed:
             raise ValueError('stream is closed')
 
-        if self._finished_output:
+        if size < -1:
+            raise ValueError('cannot read negative amounts less than -1')
+
+        if size == -1:
+            return self.readall()
+
+        if self._finished_output or size == 0:
             return b''
 
-        if size < 1:
-            raise ValueError('cannot read negative or size 0 amounts')
-
         # Need a dedicated ref to dest buffer otherwise it gets collected.
         dst_buffer = ffi.new('char[]', size)
         out_buffer = ffi.new('ZSTD_outBuffer *')
@@ -688,71 +943,21 @@
         out_buffer.size = size
         out_buffer.pos = 0
 
-        def compress_input():
-            if self._in_buffer.pos >= self._in_buffer.size:
-                return
-
-            old_pos = out_buffer.pos
-
-            zresult = lib.ZSTD_compress_generic(self._compressor._cctx,
-                                                out_buffer, self._in_buffer,
-                                                lib.ZSTD_e_continue)
-
-            self._bytes_compressed += out_buffer.pos - old_pos
-
-            if self._in_buffer.pos == self._in_buffer.size:
-                self._in_buffer.src = ffi.NULL
-                self._in_buffer.pos = 0
-                self._in_buffer.size = 0
-                self._source_buffer = None
-
-                if not hasattr(self._source, 'read'):
-                    self._finished_input = True
-
-            if lib.ZSTD_isError(zresult):
-                raise ZstdError('zstd compress error: %s',
-                                _zstd_error(zresult))
-
-            if out_buffer.pos and out_buffer.pos == out_buffer.size:
-                return ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
-
-        def get_input():
-            if self._finished_input:
-                return
-
-            if hasattr(self._source, 'read'):
-                data = self._source.read(self._read_size)
-
-                if not data:
-                    self._finished_input = True
-                    return
-
-                self._source_buffer = ffi.from_buffer(data)
-                self._in_buffer.src = self._source_buffer
-                self._in_buffer.size = len(self._source_buffer)
-                self._in_buffer.pos = 0
-            else:
-                self._source_buffer = ffi.from_buffer(self._source)
-                self._in_buffer.src = self._source_buffer
-                self._in_buffer.size = len(self._source_buffer)
-                self._in_buffer.pos = 0
-
-        result = compress_input()
-        if result:
-            return result
+        if self._compress_into_buffer(out_buffer):
+            return ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
 
         while not self._finished_input:
-            get_input()
-            result = compress_input()
-            if result:
-                return result
+            self._read_input()
+
+            if self._compress_into_buffer(out_buffer):
+                return ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
 
         # EOF
         old_pos = out_buffer.pos
 
-        zresult = lib.ZSTD_compress_generic(self._compressor._cctx,
-                                            out_buffer, self._in_buffer,
-                                            lib.ZSTD_e_end)
+        zresult = lib.ZSTD_compressStream2(self._compressor._cctx,
+                                           out_buffer, self._in_buffer,
+                                           lib.ZSTD_e_end)
 
         self._bytes_compressed += out_buffer.pos - old_pos
 
@@ -765,6 +970,159 @@
 
         return ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
 
+    def read1(self, size=-1):
+        if self._closed:
+            raise ValueError('stream is closed')
+
+        if size < -1:
+            raise ValueError('cannot read negative amounts less than -1')
+
+        if self._finished_output or size == 0:
+            return b''
+
+        # -1 returns arbitrary number of bytes.
+        if size == -1:
+            size = COMPRESSION_RECOMMENDED_OUTPUT_SIZE
+
+        dst_buffer = ffi.new('char[]', size)
+        out_buffer = ffi.new('ZSTD_outBuffer *')
+        out_buffer.dst = dst_buffer
+        out_buffer.size = size
+        out_buffer.pos = 0
+
+        # read1() dictates that we can perform at most 1 call to the
+        # underlying stream to get input. However, we can't satisfy this
+        # restriction with compression because not all input generates output.
+        # It is possible to perform a block flush in order to ensure output.
+        # But this may not be desirable behavior. So we allow multiple read()
+        # to the underlying stream. But unlike read(), we stop once we have
+        # any output.
+
+        self._compress_into_buffer(out_buffer)
+        if out_buffer.pos:
+            return ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
+
+        while not self._finished_input:
+            self._read_input()
+
+            # If we've filled the output buffer, return immediately.
+            if self._compress_into_buffer(out_buffer):
+                return ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
+
+            # If we've populated the output buffer and we're not at EOF,
+            # also return, as we've satisfied the read1() limits.
+            if out_buffer.pos and not self._finished_input:
+                return ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
+
+            # Else if we're at EOS and we have room left in the buffer,
+            # fall through to below and try to add more data to the output.
+
+        # EOF.
+        old_pos = out_buffer.pos
+
+        zresult = lib.ZSTD_compressStream2(self._compressor._cctx,
+                                           out_buffer, self._in_buffer,
+                                           lib.ZSTD_e_end)
+
+        self._bytes_compressed += out_buffer.pos - old_pos
+
+        if lib.ZSTD_isError(zresult):
+            raise ZstdError('error ending compression stream: %s' %
+                            _zstd_error(zresult))
+
+        if zresult == 0:
+            self._finished_output = True
+
+        return ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
+
+    def readinto(self, b):
+        if self._closed:
+            raise ValueError('stream is closed')
+
+        if self._finished_output:
+            return 0
+
+        # TODO use writable=True once we require CFFI >= 1.12.
+        dest_buffer = ffi.from_buffer(b)
+        ffi.memmove(b, b'', 0)
+        out_buffer = ffi.new('ZSTD_outBuffer *')
+        out_buffer.dst = dest_buffer
+        out_buffer.size = len(dest_buffer)
+        out_buffer.pos = 0
+
+        if self._compress_into_buffer(out_buffer):
+            return out_buffer.pos
+
+        while not self._finished_input:
+            self._read_input()
+            if self._compress_into_buffer(out_buffer):
+                return out_buffer.pos
+
+        # EOF.
+        old_pos = out_buffer.pos
+        zresult = lib.ZSTD_compressStream2(self._compressor._cctx,
+                                           out_buffer, self._in_buffer,
+                                           lib.ZSTD_e_end)
+
+        self._bytes_compressed += out_buffer.pos - old_pos
+
+        if lib.ZSTD_isError(zresult):
+            raise ZstdError('error ending compression stream: %s',
+                            _zstd_error(zresult))
+
+        if zresult == 0:
+            self._finished_output = True
+
+        return out_buffer.pos
+
+    def readinto1(self, b):
+        if self._closed:
+            raise ValueError('stream is closed')
+
+        if self._finished_output:
+            return 0
+
+        # TODO use writable=True once we require CFFI >= 1.12.
+        dest_buffer = ffi.from_buffer(b)
+        ffi.memmove(b, b'', 0)
+
+        out_buffer = ffi.new('ZSTD_outBuffer *')
+        out_buffer.dst = dest_buffer
+        out_buffer.size = len(dest_buffer)
+        out_buffer.pos = 0
+
+        self._compress_into_buffer(out_buffer)
+        if out_buffer.pos:
+            return out_buffer.pos
+
+        while not self._finished_input:
+            self._read_input()
+
+            if self._compress_into_buffer(out_buffer):
+                return out_buffer.pos
+
+            if out_buffer.pos and not self._finished_input:
+                return out_buffer.pos
+
+        # EOF.
+        old_pos = out_buffer.pos
+
+        zresult = lib.ZSTD_compressStream2(self._compressor._cctx,
+                                           out_buffer, self._in_buffer,
+                                           lib.ZSTD_e_end)
+
+        self._bytes_compressed += out_buffer.pos - old_pos
+
+        if lib.ZSTD_isError(zresult):
+            raise ZstdError('error ending compression stream: %s' %
+                            _zstd_error(zresult))
+
+        if zresult == 0:
+            self._finished_output = True
+
+        return out_buffer.pos
+
+
 class ZstdCompressor(object):
     def __init__(self, level=3, dict_data=None, compression_params=None,
                  write_checksum=None, write_content_size=None,
@@ -803,25 +1161,25 @@
             self._params = ffi.gc(params, lib.ZSTD_freeCCtxParams)
 
             _set_compression_parameter(self._params,
-                                       lib.ZSTD_p_compressionLevel,
+                                       lib.ZSTD_c_compressionLevel,
                                        level)
 
             _set_compression_parameter(
                 self._params,
-                lib.ZSTD_p_contentSizeFlag,
+                lib.ZSTD_c_contentSizeFlag,
                 write_content_size if write_content_size is not None else 1)
 
             _set_compression_parameter(self._params,
-                                       lib.ZSTD_p_checksumFlag,
+                                       lib.ZSTD_c_checksumFlag,
                                        1 if write_checksum else 0)
 
             _set_compression_parameter(self._params,
-                                       lib.ZSTD_p_dictIDFlag,
+                                       lib.ZSTD_c_dictIDFlag,
                                        1 if write_dict_id else 0)
 
             if threads:
                 _set_compression_parameter(self._params,
-                                           lib.ZSTD_p_nbWorkers,
+                                           lib.ZSTD_c_nbWorkers,
                                            threads)
 
         cctx = lib.ZSTD_createCCtx()
@@ -864,7 +1222,7 @@
         return lib.ZSTD_sizeof_CCtx(self._cctx)
 
     def compress(self, data):
-        lib.ZSTD_CCtx_reset(self._cctx)
+        lib.ZSTD_CCtx_reset(self._cctx, lib.ZSTD_reset_session_only)
 
         data_buffer = ffi.from_buffer(data)
 
@@ -887,10 +1245,10 @@
         in_buffer.size = len(data_buffer)
         in_buffer.pos = 0
 
-        zresult = lib.ZSTD_compress_generic(self._cctx,
-                                            out_buffer,
-                                            in_buffer,
-                                            lib.ZSTD_e_end)
+        zresult = lib.ZSTD_compressStream2(self._cctx,
+                                           out_buffer,
+                                           in_buffer,
+                                           lib.ZSTD_e_end)
 
         if lib.ZSTD_isError(zresult):
             raise ZstdError('cannot compress: %s' %
@@ -901,7 +1259,7 @@
         return ffi.buffer(out, out_buffer.pos)[:]
 
     def compressobj(self, size=-1):
-        lib.ZSTD_CCtx_reset(self._cctx)
+        lib.ZSTD_CCtx_reset(self._cctx, lib.ZSTD_reset_session_only)
 
         if size < 0:
             size = lib.ZSTD_CONTENTSIZE_UNKNOWN
@@ -923,7 +1281,7 @@
         return cobj
 
     def chunker(self, size=-1, chunk_size=COMPRESSION_RECOMMENDED_OUTPUT_SIZE):
-        lib.ZSTD_CCtx_reset(self._cctx)
+        lib.ZSTD_CCtx_reset(self._cctx, lib.ZSTD_reset_session_only)
 
         if size < 0:
             size = lib.ZSTD_CONTENTSIZE_UNKNOWN
@@ -944,7 +1302,7 @@
         if not hasattr(ofh, 'write'):
             raise ValueError('second argument must have a write() method')
 
-        lib.ZSTD_CCtx_reset(self._cctx)
+        lib.ZSTD_CCtx_reset(self._cctx, lib.ZSTD_reset_session_only)
 
         if size < 0:
             size = lib.ZSTD_CONTENTSIZE_UNKNOWN
@@ -976,10 +1334,10 @@
             in_buffer.pos = 0
 
             while in_buffer.pos < in_buffer.size:
-                zresult = lib.ZSTD_compress_generic(self._cctx,
-                                                    out_buffer,
-                                                    in_buffer,
-                                                    lib.ZSTD_e_continue)
+                zresult = lib.ZSTD_compressStream2(self._cctx,
+                                                   out_buffer,
+                                                   in_buffer,
+                                                   lib.ZSTD_e_continue)
                 if lib.ZSTD_isError(zresult):
                     raise ZstdError('zstd compress error: %s' %
                                     _zstd_error(zresult))
@@ -991,10 +1349,10 @@
 
         # We've finished reading. Flush the compressor.
         while True:
-            zresult = lib.ZSTD_compress_generic(self._cctx,
-                                                out_buffer,
-                                                in_buffer,
-                                                lib.ZSTD_e_end)
+            zresult = lib.ZSTD_compressStream2(self._cctx,
+                                               out_buffer,
+                                               in_buffer,
+                                               lib.ZSTD_e_end)
             if lib.ZSTD_isError(zresult):
                 raise ZstdError('error ending compression stream: %s' %
                                 _zstd_error(zresult))
@@ -1011,7 +1369,7 @@
 
     def stream_reader(self, source, size=-1,
                       read_size=COMPRESSION_RECOMMENDED_INPUT_SIZE):
-        lib.ZSTD_CCtx_reset(self._cctx)
+        lib.ZSTD_CCtx_reset(self._cctx, lib.ZSTD_reset_session_only)
 
         try:
             size = len(source)
@@ -1026,20 +1384,22 @@
             raise ZstdError('error setting source size: %s' %
                             _zstd_error(zresult))
 
-        return CompressionReader(self, source, read_size)
+        return ZstdCompressionReader(self, source, read_size)
 
     def stream_writer(self, writer, size=-1,
-                 write_size=COMPRESSION_RECOMMENDED_OUTPUT_SIZE):
+                 write_size=COMPRESSION_RECOMMENDED_OUTPUT_SIZE,
+                 write_return_read=False):
 
         if not hasattr(writer, 'write'):
             raise ValueError('must pass an object with a write() method')
 
-        lib.ZSTD_CCtx_reset(self._cctx)
+        lib.ZSTD_CCtx_reset(self._cctx, lib.ZSTD_reset_session_only)
 
         if size < 0:
             size = lib.ZSTD_CONTENTSIZE_UNKNOWN
 
-        return ZstdCompressionWriter(self, writer, size, write_size)
+        return ZstdCompressionWriter(self, writer, size, write_size,
+                                     write_return_read)
 
     write_to = stream_writer
 
@@ -1056,7 +1416,7 @@
             raise ValueError('must pass an object with a read() method or '
                              'conforms to buffer protocol')
 
-        lib.ZSTD_CCtx_reset(self._cctx)
+        lib.ZSTD_CCtx_reset(self._cctx, lib.ZSTD_reset_session_only)
 
         if size < 0:
             size = lib.ZSTD_CONTENTSIZE_UNKNOWN
@@ -1104,8 +1464,8 @@
             in_buffer.pos = 0
 
             while in_buffer.pos < in_buffer.size:
-                zresult = lib.ZSTD_compress_generic(self._cctx, out_buffer, in_buffer,
-                                                    lib.ZSTD_e_continue)
+                zresult = lib.ZSTD_compressStream2(self._cctx, out_buffer, in_buffer,
+                                                   lib.ZSTD_e_continue)
                 if lib.ZSTD_isError(zresult):
                     raise ZstdError('zstd compress error: %s' %
                                     _zstd_error(zresult))
@@ -1124,10 +1484,10 @@
         # remains.
         while True:
             assert out_buffer.pos == 0
-            zresult = lib.ZSTD_compress_generic(self._cctx,
-                                                out_buffer,
-                                                in_buffer,
-                                                lib.ZSTD_e_end)
+            zresult = lib.ZSTD_compressStream2(self._cctx,
+                                               out_buffer,
+                                               in_buffer,
+                                               lib.ZSTD_e_end)
             if lib.ZSTD_isError(zresult):
                 raise ZstdError('error ending compression stream: %s' %
                                 _zstd_error(zresult))
@@ -1234,7 +1594,7 @@
             cparams = ffi.new('ZSTD_compressionParameters')
             cparams.chainLog = compression_params.chain_log
             cparams.hashLog = compression_params.hash_log
-            cparams.searchLength = compression_params.min_match
+            cparams.minMatch = compression_params.min_match
             cparams.searchLog = compression_params.search_log
             cparams.strategy = compression_params.compression_strategy
             cparams.targetLength = compression_params.target_length
@@ -1345,6 +1705,10 @@
         out_buffer = ffi.new('ZSTD_outBuffer *')
 
         data_buffer = ffi.from_buffer(data)
+
+        if len(data_buffer) == 0:
+            return b''
+
         in_buffer.src = data_buffer
         in_buffer.size = len(data_buffer)
         in_buffer.pos = 0
@@ -1357,8 +1721,8 @@
         chunks = []
 
         while True:
-            zresult = lib.ZSTD_decompress_generic(self._decompressor._dctx,
-                                                  out_buffer, in_buffer)
+            zresult = lib.ZSTD_decompressStream(self._decompressor._dctx,
+                                                out_buffer, in_buffer)
             if lib.ZSTD_isError(zresult):
                 raise ZstdError('zstd decompressor error: %s' %
                                 _zstd_error(zresult))
@@ -1378,12 +1742,16 @@
 
         return b''.join(chunks)
 
-
-class DecompressionReader(object):
-    def __init__(self, decompressor, source, read_size):
+    def flush(self, length=0):
+        pass
+
+
+class ZstdDecompressionReader(object):
+    def __init__(self, decompressor, source, read_size, read_across_frames):
         self._decompressor = decompressor
         self._source = source
         self._read_size = read_size
+        self._read_across_frames = bool(read_across_frames)
         self._entered = False
         self._closed = False
         self._bytes_decompressed = 0
@@ -1418,10 +1786,10 @@
         return True
 
     def readline(self):
-        raise NotImplementedError()
+        raise io.UnsupportedOperation()
 
     def readlines(self):
-        raise NotImplementedError()
+        raise io.UnsupportedOperation()
 
     def write(self, data):
         raise io.UnsupportedOperation()
@@ -1447,25 +1815,158 @@
         return self._bytes_decompressed
 
     def readall(self):
-        raise NotImplementedError()
+        chunks = []
+
+        while True:
+            chunk = self.read(1048576)
+            if not chunk:
+                break
+
+            chunks.append(chunk)
+
+        return b''.join(chunks)
 
     def __iter__(self):
-        raise NotImplementedError()
+        raise io.UnsupportedOperation()
 
     def __next__(self):
-        raise NotImplementedError()
+        raise io.UnsupportedOperation()
 
     next = __next__
 
-    def read(self, size):
+    def _read_input(self):
+        # We have data left over in the input buffer. Use it.
+        if self._in_buffer.pos < self._in_buffer.size:
+            return
+
+        # All input data exhausted. Nothing to do.
+        if self._finished_input:
+            return
+
+        # Else populate the input buffer from our source.
+        if hasattr(self._source, 'read'):
+            data = self._source.read(self._read_size)
+
+            if not data:
+                self._finished_input = True
+                return
+
+            self._source_buffer = ffi.from_buffer(data)
+            self._in_buffer.src = self._source_buffer
+            self._in_buffer.size = len(self._source_buffer)
+            self._in_buffer.pos = 0
+        else:
+            self._source_buffer = ffi.from_buffer(self._source)
+            self._in_buffer.src = self._source_buffer
+            self._in_buffer.size = len(self._source_buffer)
+            self._in_buffer.pos = 0
+
+    def _decompress_into_buffer(self, out_buffer):
+        """Decompress available input into an output buffer.
+
+        Returns True if data in output buffer should be emitted.
+        """
+        zresult = lib.ZSTD_decompressStream(self._decompressor._dctx,
+                                            out_buffer, self._in_buffer)
+
+        if self._in_buffer.pos == self._in_buffer.size:
+            self._in_buffer.src = ffi.NULL
+            self._in_buffer.pos = 0
+            self._in_buffer.size = 0
+            self._source_buffer = None
+
+            if not hasattr(self._source, 'read'):
+                self._finished_input = True
+
+        if lib.ZSTD_isError(zresult):
+            raise ZstdError('zstd decompress error: %s' %
+                            _zstd_error(zresult))
+
+        # Emit data if there is data AND either:
+        # a) output buffer is full (read amount is satisfied)
+        # b) we're at end of a frame and not in frame spanning mode
+        return (out_buffer.pos and
+                (out_buffer.pos == out_buffer.size or
+                 zresult == 0 and not self._read_across_frames))
+
+    def read(self, size=-1):
+        if self._closed:
+            raise ValueError('stream is closed')
+
+        if size < -1:
+            raise ValueError('cannot read negative amounts less than -1')
+
+        if size == -1:
+            # This is recursive. But it gets the job done.
+            return self.readall()
+
+        if self._finished_output or size == 0:
+            return b''
+
+        # We /could/ call into readinto() here. But that introduces more
+        # overhead.
+        dst_buffer = ffi.new('char[]', size)
+        out_buffer = ffi.new('ZSTD_outBuffer *')
+        out_buffer.dst = dst_buffer
+        out_buffer.size = size
+        out_buffer.pos = 0
+
+        self._read_input()
+        if self._decompress_into_buffer(out_buffer):
+            self._bytes_decompressed += out_buffer.pos
+            return ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
+
+        while not self._finished_input:
+            self._read_input()
+            if self._decompress_into_buffer(out_buffer):
+                self._bytes_decompressed += out_buffer.pos
+                return ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
+
+        self._bytes_decompressed += out_buffer.pos
+        return ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
+
+    def readinto(self, b):
         if self._closed:
             raise ValueError('stream is closed')
 
         if self._finished_output:
+            return 0
+
+        # TODO use writable=True once we require CFFI >= 1.12.
+        dest_buffer = ffi.from_buffer(b)
+        ffi.memmove(b, b'', 0)
+        out_buffer = ffi.new('ZSTD_outBuffer *')
+        out_buffer.dst = dest_buffer
+        out_buffer.size = len(dest_buffer)
+        out_buffer.pos = 0
+
+        self._read_input()
+        if self._decompress_into_buffer(out_buffer):
+            self._bytes_decompressed += out_buffer.pos
+            return out_buffer.pos
+
+        while not self._finished_input:
+            self._read_input()
+            if self._decompress_into_buffer(out_buffer):
+                self._bytes_decompressed += out_buffer.pos
+                return out_buffer.pos
+
+        self._bytes_decompressed += out_buffer.pos
+        return out_buffer.pos
+
+    def read1(self, size=-1):
+        if self._closed:
+            raise ValueError('stream is closed')
+
+        if size < -1:
+            raise ValueError('cannot read negative amounts less than -1')
+
+        if self._finished_output or size == 0:
             return b''
 
-        if size < 1:
-            raise ValueError('cannot read negative or size 0 amounts')
+        # -1 returns arbitrary number of bytes.
+        if size == -1:
+            size = DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE
 
         dst_buffer = ffi.new('char[]', size)
         out_buffer = ffi.new('ZSTD_outBuffer *')
@@ -1473,64 +1974,46 @@
         out_buffer.size = size
         out_buffer.pos = 0
 
-        def decompress():
-            zresult = lib.ZSTD_decompress_generic(self._decompressor._dctx,
-                                                  out_buffer, self._in_buffer)
-
-            if self._in_buffer.pos == self._in_buffer.size:
-                self._in_buffer.src = ffi.NULL
-                self._in_buffer.pos = 0
-                self._in_buffer.size = 0
-                self._source_buffer = None
-
-                if not hasattr(self._source, 'read'):
-                    self._finished_input = True
-
-            if lib.ZSTD_isError(zresult):
-                raise ZstdError('zstd decompress error: %s',
-                                _zstd_error(zresult))
-            elif zresult == 0:
-                self._finished_output = True
-
-            if out_buffer.pos and out_buffer.pos == out_buffer.size:
-                self._bytes_decompressed += out_buffer.size
-                return ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
-
-        def get_input():
-            if self._finished_input:
-                return
-
-            if hasattr(self._source, 'read'):
-                data = self._source.read(self._read_size)
-
-                if not data:
-                    self._finished_input = True
-                    return
-
-                self._source_buffer = ffi.from_buffer(data)
-                self._in_buffer.src = self._source_buffer
-                self._in_buffer.size = len(self._source_buffer)
-                self._in_buffer.pos = 0
-            else:
-                self._source_buffer = ffi.from_buffer(self._source)
-                self._in_buffer.src = self._source_buffer
-                self._in_buffer.size = len(self._source_buffer)
-                self._in_buffer.pos = 0
-
-        get_input()
-        result = decompress()
-        if result:
-            return result
-
+        # read1() dictates that we can perform at most 1 call to underlying
+        # stream to get input. However, we can't satisfy this restriction with
+        # decompression because not all input generates output. So we allow
+        # multiple read(). But unlike read(), we stop once we have any output.
         while not self._finished_input:
-            get_input()
-            result = decompress()
-            if result:
-                return result
+            self._read_input()
+            self._decompress_into_buffer(out_buffer)
+
+            if out_buffer.pos:
+                break
 
         self._bytes_decompressed += out_buffer.pos
         return ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
 
+    def readinto1(self, b):
+        if self._closed:
+            raise ValueError('stream is closed')
+
+        if self._finished_output:
+            return 0
+
+        # TODO use writable=True once we require CFFI >= 1.12.
+        dest_buffer = ffi.from_buffer(b)
+        ffi.memmove(b, b'', 0)
+
+        out_buffer = ffi.new('ZSTD_outBuffer *')
+        out_buffer.dst = dest_buffer
+        out_buffer.size = len(dest_buffer)
+        out_buffer.pos = 0
+
+        while not self._finished_input and not self._finished_output:
+            self._read_input()
+            self._decompress_into_buffer(out_buffer)
+
+            if out_buffer.pos:
+                break
+
+        self._bytes_decompressed += out_buffer.pos
+        return out_buffer.pos
+
     def seek(self, pos, whence=os.SEEK_SET):
         if self._closed:
             raise ValueError('stream is closed')
@@ -1569,34 +2052,108 @@
         return self._bytes_decompressed
 
 class ZstdDecompressionWriter(object):
-    def __init__(self, decompressor, writer, write_size):
+    def __init__(self, decompressor, writer, write_size, write_return_read):
+        decompressor._ensure_dctx()
+
         self._decompressor = decompressor
         self._writer = writer
         self._write_size = write_size
+        self._write_return_read = bool(write_return_read)
         self._entered = False
+        self._closed = False
 
     def __enter__(self):
+        if self._closed:
+            raise ValueError('stream is closed')
+
         if self._entered:
             raise ZstdError('cannot __enter__ multiple times')
 
-        self._decompressor._ensure_dctx()
         self._entered = True
 
         return self
 
     def __exit__(self, exc_type, exc_value, exc_tb):
         self._entered = False
+        self.close()
 
     def memory_size(self):
-        if not self._decompressor._dctx:
-            raise ZstdError('cannot determine size of inactive decompressor '
-                            'call when context manager is active')
-
         return lib.ZSTD_sizeof_DCtx(self._decompressor._dctx)
 
+    def close(self):
+        if self._closed:
+            return
+
+        try:
+            self.flush()
+        finally:
+            self._closed = True
+
+        f = getattr(self._writer, 'close', None)
+        if f:
+            f()
+
+    @property
+    def closed(self):
+        return self._closed
+
+    def fileno(self):
+        f = getattr(self._writer, 'fileno', None)
+        if f:
+            return f()
+        else:
+            raise OSError('fileno not available on underlying writer')
+
+    def flush(self):
+        if self._closed:
+            raise ValueError('stream is closed')
+
+        f = getattr(self._writer, 'flush', None)
+        if f:
+            return f()
+
+    def isatty(self):
+        return False
+
+    def readable(self):
+        return False
+
+    def readline(self, size=-1):
+        raise io.UnsupportedOperation()
+
+    def readlines(self, hint=-1):
+        raise io.UnsupportedOperation()
+
+    def seek(self, offset, whence=None):
+        raise io.UnsupportedOperation()
+
+    def seekable(self):
+        return False
+
+    def tell(self):
+        raise io.UnsupportedOperation()
+
+    def truncate(self, size=None):
+        raise io.UnsupportedOperation()
+
+    def writable(self):
+        return True
+
+    def writelines(self, lines):
+        raise io.UnsupportedOperation()
+
+    def read(self, size=-1):
+        raise io.UnsupportedOperation()
+
+    def readall(self):
+        raise io.UnsupportedOperation()
+
+    def readinto(self, b):
+        raise io.UnsupportedOperation()
+
     def write(self, data):
-        if not self._entered:
-            raise ZstdError('write must be called from an active context manager')
+        if self._closed:
+            raise ValueError('stream is closed')
 
         total_write = 0
 
@@ -1616,7 +2173,7 @@
         dctx = self._decompressor._dctx
 
         while in_buffer.pos < in_buffer.size:
-            zresult = lib.ZSTD_decompress_generic(dctx, out_buffer, in_buffer)
+            zresult = lib.ZSTD_decompressStream(dctx, out_buffer, in_buffer)
             if lib.ZSTD_isError(zresult):
                 raise ZstdError('zstd decompress error: %s' %
                                 _zstd_error(zresult))
@@ -1626,7 +2183,10 @@
                 total_write += out_buffer.pos
                 out_buffer.pos = 0
 
-        return total_write
+        if self._write_return_read:
+            return in_buffer.pos
+        else:
+            return total_write
 
 
 class ZstdDecompressor(object):
@@ -1684,7 +2244,7 @@
         in_buffer.size = len(data_buffer)
         in_buffer.pos = 0
 
-        zresult = lib.ZSTD_decompress_generic(self._dctx, out_buffer, in_buffer)
+        zresult = lib.ZSTD_decompressStream(self._dctx, out_buffer, in_buffer)
         if lib.ZSTD_isError(zresult):
             raise ZstdError('decompression error: %s' %
                             _zstd_error(zresult))
@@ -1696,9 +2256,10 @@
 
         return ffi.buffer(result_buffer, out_buffer.pos)[:]
 
-    def stream_reader(self, source, read_size=DECOMPRESSION_RECOMMENDED_INPUT_SIZE):
+    def stream_reader(self, source, read_size=DECOMPRESSION_RECOMMENDED_INPUT_SIZE,
+                      read_across_frames=False):
         self._ensure_dctx()
-        return DecompressionReader(self, source, read_size)
+        return ZstdDecompressionReader(self, source, read_size, read_across_frames)
 
     def decompressobj(self, write_size=DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE):
         if write_size < 1:
@@ -1767,7 +2328,7 @@
             while in_buffer.pos < in_buffer.size:
                 assert out_buffer.pos == 0
 
-                zresult = lib.ZSTD_decompress_generic(self._dctx, out_buffer, in_buffer)
+                zresult = lib.ZSTD_decompressStream(self._dctx, out_buffer, in_buffer)
                 if lib.ZSTD_isError(zresult):
                     raise ZstdError('zstd decompress error: %s' %
                                     _zstd_error(zresult))
@@ -1787,11 +2348,13 @@
 
     read_from = read_to_iter
 
-    def stream_writer(self, writer, write_size=DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE):
+    def stream_writer(self, writer, write_size=DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE,
+                      write_return_read=False):
         if not hasattr(writer, 'write'):
             raise ValueError('must pass an object with a write() method')
 
-        return ZstdDecompressionWriter(self, writer, write_size)
+        return ZstdDecompressionWriter(self, writer, write_size,
+                                       write_return_read)
 
     write_to = stream_writer
 
@@ -1829,7 +2392,7 @@
 
             # Flush all read data to output.
             while in_buffer.pos < in_buffer.size:
-                zresult = lib.ZSTD_decompress_generic(self._dctx, out_buffer, in_buffer)
+                zresult = lib.ZSTD_decompressStream(self._dctx, out_buffer, in_buffer)
                 if lib.ZSTD_isError(zresult):
                     raise ZstdError('zstd decompressor error: %s' %
                                     _zstd_error(zresult))
@@ -1881,7 +2444,7 @@
         in_buffer.size = len(chunk_buffer)
         in_buffer.pos = 0
 
-        zresult = lib.ZSTD_decompress_generic(self._dctx, out_buffer, in_buffer)
+        zresult = lib.ZSTD_decompressStream(self._dctx, out_buffer, in_buffer)
         if lib.ZSTD_isError(zresult):
             raise ZstdError('could not decompress chunk 0: %s' %
                             _zstd_error(zresult))
@@ -1918,7 +2481,7 @@
             in_buffer.size = len(chunk_buffer)
             in_buffer.pos = 0
 
-            zresult = lib.ZSTD_decompress_generic(self._dctx, out_buffer, in_buffer)
+            zresult = lib.ZSTD_decompressStream(self._dctx, out_buffer, in_buffer)
             if lib.ZSTD_isError(zresult):
                 raise ZstdError('could not decompress chunk %d: %s' %
                                 _zstd_error(zresult))
@@ -1931,7 +2494,7 @@
         return ffi.buffer(last_buffer, len(last_buffer))[:]
 
     def _ensure_dctx(self, load_dict=True):
-        lib.ZSTD_DCtx_reset(self._dctx)
+        lib.ZSTD_DCtx_reset(self._dctx, lib.ZSTD_reset_session_only)
 
         if self._max_window_size:
             zresult = lib.ZSTD_DCtx_setMaxWindowSize(self._dctx,
diff --git a/contrib/python-zstandard/zstd.c b/contrib/python-zstandard/zstd.c
--- a/contrib/python-zstandard/zstd.c
+++ b/contrib/python-zstandard/zstd.c
@@ -210,7 +210,7 @@
 	   We detect this mismatch here and refuse to load the module if this
 	   scenario is detected.
 	*/
-	if (ZSTD_VERSION_NUMBER != 10306 || ZSTD_versionNumber() != 10306) {
+	if (ZSTD_VERSION_NUMBER != 10308 || ZSTD_versionNumber() != 10308) {
 		PyErr_SetString(PyExc_ImportError, "zstd C API mismatch; Python bindings not compiled against expected zstd version");
 		return;
 	}
diff --git a/contrib/python-zstandard/zstd/common/bitstream.h b/contrib/python-zstandard/zstd/common/bitstream.h
--- a/contrib/python-zstandard/zstd/common/bitstream.h
+++ b/contrib/python-zstandard/zstd/common/bitstream.h
@@ -339,17 +339,10 @@
 
 MEM_STATIC size_t BIT_getMiddleBits(size_t bitContainer, U32 const start, U32 const nbBits)
 {
-#if defined(__BMI__) && defined(__GNUC__) && __GNUC__*1000+__GNUC_MINOR__ >= 4008  /* experimental */
-#  if defined(__x86_64__)
-    if (sizeof(bitContainer)==8)
-        return _bextr_u64(bitContainer, start, nbBits);
-    else
-#  endif
-        return _bextr_u32(bitContainer, start, nbBits);
-#else
+    U32 const regMask = sizeof(bitContainer)*8 - 1;
+    /* if start > regMask, bitstream is corrupted, and result is undefined */
     assert(nbBits < BIT_MASK_SIZE);
-    return (bitContainer >> start) & BIT_mask[nbBits];
-#endif
+    return (bitContainer >> (start & regMask)) & BIT_mask[nbBits];
 }
 
 MEM_STATIC size_t BIT_getLowerBits(size_t bitContainer, U32 const nbBits)
@@ -366,9 +359,13 @@
  * @return : value extracted */
 MEM_STATIC size_t BIT_lookBits(const BIT_DStream_t* bitD, U32 nbBits)
 {
-#if defined(__BMI__) && defined(__GNUC__)   /* experimental; fails if bitD->bitsConsumed + nbBits > sizeof(bitD->bitContainer)*8 */
+    /* arbitrate between double-shift and shift+mask */
+#if 1
+    /* if bitD->bitsConsumed + nbBits > sizeof(bitD->bitContainer)*8,
+     * bitstream is likely corrupted, and result is undefined */
     return BIT_getMiddleBits(bitD->bitContainer, (sizeof(bitD->bitContainer)*8) - bitD->bitsConsumed - nbBits, nbBits);
 #else
+    /* this code path is slower on my os-x laptop */
     U32 const regMask = sizeof(bitD->bitContainer)*8 - 1;
     return ((bitD->bitContainer << (bitD->bitsConsumed & regMask)) >> 1) >> ((regMask-nbBits) & regMask);
 #endif
@@ -392,7 +389,7 @@
  *  Read (consume) next n bits from local register and update.
  *  Pay attention to not read more than nbBits contained into local register.
  * @return : extracted value. */
-MEM_STATIC size_t BIT_readBits(BIT_DStream_t* bitD, U32 nbBits)
+MEM_STATIC size_t BIT_readBits(BIT_DStream_t* bitD, unsigned nbBits)
 {
     size_t const value = BIT_lookBits(bitD, nbBits);
     BIT_skipBits(bitD, nbBits);
@@ -401,7 +398,7 @@
 
 /*! BIT_readBitsFast() :
  *  unsafe version; only works only if nbBits >= 1 */
-MEM_STATIC size_t BIT_readBitsFast(BIT_DStream_t* bitD, U32 nbBits)
+MEM_STATIC size_t BIT_readBitsFast(BIT_DStream_t* bitD, unsigned nbBits)
 {
     size_t const value = BIT_lookBitsFast(bitD, nbBits);
     assert(nbBits >= 1);
diff --git a/contrib/python-zstandard/zstd/common/compiler.h b/contrib/python-zstandard/zstd/common/compiler.h
--- a/contrib/python-zstandard/zstd/common/compiler.h
+++ b/contrib/python-zstandard/zstd/common/compiler.h
@@ -15,6 +15,8 @@
 *  Compiler specifics
 *********************************************************/
 /* force inlining */
+
+#if !defined(ZSTD_NO_INLINE)
 #if defined (__GNUC__) || defined(__cplusplus) || defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L   /* C99 */
 #  define INLINE_KEYWORD inline
 #else
@@ -29,6 +31,13 @@
 #  define FORCE_INLINE_ATTR
 #endif
 
+#else
+
+#define INLINE_KEYWORD
+#define FORCE_INLINE_ATTR
+
+#endif
+
 /**
  * FORCE_INLINE_TEMPLATE is used to define C "templates", which take constant
  * parameters. They must be inlined for the compiler to elimininate the constant
@@ -89,23 +98,21 @@
 #endif
 
 /* prefetch
- * can be disabled, by declaring NO_PREFETCH macro
- * All prefetch invocations use a single default locality 2,
- * generating instruction prefetcht1,
- * which, according to Intel, means "load data into L2 cache".
- * This is a good enough "middle ground" for the time being,
- * though in theory, it would be better to specialize locality depending on data being prefetched.
- * Tests could not determine any sensible difference based on locality value. */
+ * can be disabled, by declaring NO_PREFETCH build macro */
 #if defined(NO_PREFETCH)
-#  define PREFETCH(ptr)     (void)(ptr)  /* disabled */
+#  define PREFETCH_L1(ptr)  (void)(ptr)  /* disabled */
+#  define PREFETCH_L2(ptr)  (void)(ptr)  /* disabled */
 #else
 #  if defined(_MSC_VER) && (defined(_M_X64) || defined(_M_I86))  /* _mm_prefetch() is not defined outside of x86/x64 */
 #    include <mmintrin.h>   /* https://msdn.microsoft.com/fr-fr/library/84szxsww(v=vs.90).aspx */
-#    define PREFETCH(ptr)   _mm_prefetch((const char*)(ptr), _MM_HINT_T1)
+#    define PREFETCH_L1(ptr)  _mm_prefetch((const char*)(ptr), _MM_HINT_T0)
+#    define PREFETCH_L2(ptr)  _mm_prefetch((const char*)(ptr), _MM_HINT_T1)
 #  elif defined(__GNUC__) && ( (__GNUC__ >= 4) || ( (__GNUC__ == 3) && (__GNUC_MINOR__ >= 1) ) )
-#    define PREFETCH(ptr)   __builtin_prefetch((ptr), 0 /* rw==read */, 2 /* locality */)
+#    define PREFETCH_L1(ptr)  __builtin_prefetch((ptr), 0 /* rw==read */, 3 /* locality */)
+#    define PREFETCH_L2(ptr)  __builtin_prefetch((ptr), 0 /* rw==read */, 2 /* locality */)
 #  else
-#    define PREFETCH(ptr)   (void)(ptr)  /* disabled */
+#    define PREFETCH_L1(ptr) (void)(ptr)  /* disabled */
+#    define PREFETCH_L2(ptr) (void)(ptr)  /* disabled */
 #  endif
 #endif  /* NO_PREFETCH */
 
@@ -116,7 +123,7 @@
     size_t const _size = (size_t)(s);     \
     size_t _pos;                          \
     for (_pos=0; _pos<_size; _pos+=CACHELINE_SIZE) {  \
-        PREFETCH(_ptr + _pos);            \
+        PREFETCH_L2(_ptr + _pos);         \
     }                                     \
 }
 
diff --git a/contrib/python-zstandard/zstd/common/cpu.h b/contrib/python-zstandard/zstd/common/cpu.h
--- a/contrib/python-zstandard/zstd/common/cpu.h
+++ b/contrib/python-zstandard/zstd/common/cpu.h
@@ -78,7 +78,7 @@
       __asm__(
           "pushl %%ebx\n\t"
           "cpuid\n\t"
-          "movl %%ebx, %%eax\n\r"
+          "movl %%ebx, %%eax\n\t"
           "popl %%ebx"
           : "=a"(f7b), "=c"(f7c)
           : "a"(7), "c"(0)
diff --git a/contrib/python-zstandard/zstd/common/debug.h b/contrib/python-zstandard/zstd/common/debug.h
--- a/contrib/python-zstandard/zstd/common/debug.h
+++ b/contrib/python-zstandard/zstd/common/debug.h
@@ -57,9 +57,9 @@
 #endif
 
 
-/* static assert is triggered at compile time, leaving no runtime artefact,
- * but can only work with compile-time constants.
- * This variant can only be used inside a function. */
+/* static assert is triggered at compile time, leaving no runtime artefact.
+ * static assert only works with compile-time constants.
+ * Also, this variant can only be used inside a function. */
 #define DEBUG_STATIC_ASSERT(c) (void)sizeof(char[(c) ? 1 : -1])
 
 
@@ -70,9 +70,19 @@
 #  define DEBUGLEVEL 0
 #endif
 
+
+/* DEBUGFILE can be defined externally,
+ * typically through compiler command line.
+ * note : currently useless.
+ * Value must be stderr or stdout */
+#ifndef DEBUGFILE
+#  define DEBUGFILE stderr
+#endif
+
+
 /* recommended values for DEBUGLEVEL :
- * 0 : no debug, all run-time functions disabled
- * 1 : no display, enables assert() only
+ * 0 : release mode, no debug, all run-time checks disabled
+ * 1 : enables assert() only, no display
  * 2 : reserved, for currently active debug path
  * 3 : events once per object lifetime (CCtx, CDict, etc.)
  * 4 : events once per frame
@@ -81,7 +91,7 @@
  * 7+: events at every position (*very* verbose)
  *
  * It's generally inconvenient to output traces > 5.
- * In which case, it's possible to selectively enable higher verbosity levels
+ * In which case, it's possible to selectively trigger high verbosity levels
  * by modifying g_debug_level.
  */
 
@@ -95,11 +105,12 @@
 
 #if (DEBUGLEVEL>=2)
 #  include <stdio.h>
-extern int g_debuglevel; /* here, this variable is only declared,
-                           it actually lives in debug.c,
-                           and is shared by the whole process.
-                           It's typically used to enable very verbose levels
-                           on selective conditions (such as position in src) */
+extern int g_debuglevel; /* the variable is only declared,
+                            it actually lives in debug.c,
+                            and is shared by the whole process.
+                            It's not thread-safe.
+                            It's useful when enabling very verbose levels
+                            on selective conditions (such as position in src) */
 
 #  define RAWLOG(l, ...) {                                      \
                 if (l<=g_debuglevel) {                          \
diff --git a/contrib/python-zstandard/zstd/common/error_private.c b/contrib/python-zstandard/zstd/common/error_private.c
--- a/contrib/python-zstandard/zstd/common/error_private.c
+++ b/contrib/python-zstandard/zstd/common/error_private.c
@@ -14,6 +14,10 @@
 
 const char* ERR_getErrorString(ERR_enum code)
 {
+#ifdef ZSTD_STRIP_ERROR_STRINGS
+    (void)code;
+    return "Error strings stripped";
+#else
     static const char* const notErrorCode = "Unspecified error code";
     switch( code )
     {
@@ -39,10 +43,12 @@
     case PREFIX(dictionaryCreation_failed): return "Cannot create Dictionary from provided samples";
     case PREFIX(dstSize_tooSmall): return "Destination buffer is too small";
     case PREFIX(srcSize_wrong): return "Src size is incorrect";
+    case PREFIX(dstBuffer_null): return "Operation on NULL destination buffer";
         /* following error codes are not stable and may be removed or changed in a future version */
     case PREFIX(frameIndex_tooLarge): return "Frame index is too large";
     case PREFIX(seekableIO): return "An I/O error occurred when reading/seeking";
     case PREFIX(maxCode):
     default: return notErrorCode;
     }
+#endif
 }
diff --git a/contrib/python-zstandard/zstd/common/fse.h b/contrib/python-zstandard/zstd/common/fse.h
--- a/contrib/python-zstandard/zstd/common/fse.h
+++ b/contrib/python-zstandard/zstd/common/fse.h
@@ -512,7 +512,7 @@
     const U32 tableLog = MEM_read16(ptr);
     statePtr->value = (ptrdiff_t)1<<tableLog;
     statePtr->stateTable = u16ptr+2;
-    statePtr->symbolTT = ((const U32*)ct + 1 + (tableLog ? (1<<(tableLog-1)) : 1));
+    statePtr->symbolTT = ct + 1 + (tableLog ? (1<<(tableLog-1)) : 1);
     statePtr->stateLog = tableLog;
 }
 
@@ -531,7 +531,7 @@
     }
 }
 
-MEM_STATIC void FSE_encodeSymbol(BIT_CStream_t* bitC, FSE_CState_t* statePtr, U32 symbol)
+MEM_STATIC void FSE_encodeSymbol(BIT_CStream_t* bitC, FSE_CState_t* statePtr, unsigned symbol)
 {
     FSE_symbolCompressionTransform const symbolTT = ((const FSE_symbolCompressionTransform*)(statePtr->symbolTT))[symbol];
     const U16* const stateTable = (const U16*)(statePtr->stateTable);
diff --git a/contrib/python-zstandard/zstd/common/huf.h b/contrib/python-zstandard/zstd/common/huf.h
--- a/contrib/python-zstandard/zstd/common/huf.h
+++ b/contrib/python-zstandard/zstd/common/huf.h
@@ -173,15 +173,19 @@
 *  Advanced decompression functions
 ******************************************/
 size_t HUF_decompress4X1 (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize);   /**< single-symbol decoder */
+#ifndef HUF_FORCE_DECOMPRESS_X1
 size_t HUF_decompress4X2 (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize);   /**< double-symbols decoder */
+#endif
 
 size_t HUF_decompress4X_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize);   /**< decodes RLE and uncompressed */
 size_t HUF_decompress4X_hufOnly(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize); /**< considers RLE and uncompressed as errors */
 size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize); /**< considers RLE and uncompressed as errors */
 size_t HUF_decompress4X1_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize);   /**< single-symbol decoder */
 size_t HUF_decompress4X1_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize);   /**< single-symbol decoder */
+#ifndef HUF_FORCE_DECOMPRESS_X1
 size_t HUF_decompress4X2_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize);   /**< double-symbols decoder */
 size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize);   /**< double-symbols decoder */
+#endif
 
 
 /* ****************************************
@@ -228,7 +232,7 @@
 #define HUF_CTABLE_WORKSPACE_SIZE_U32 (2*HUF_SYMBOLVALUE_MAX +1 +1)
 #define HUF_CTABLE_WORKSPACE_SIZE (HUF_CTABLE_WORKSPACE_SIZE_U32 * sizeof(unsigned))
 size_t HUF_buildCTable_wksp (HUF_CElt* tree,
-                       const U32* count, U32 maxSymbolValue, U32 maxNbBits,
+                       const unsigned* count, U32 maxSymbolValue, U32 maxNbBits,
                              void* workSpace, size_t wkspSize);
 
 /*! HUF_readStats() :
@@ -277,14 +281,22 @@
 #define HUF_DECOMPRESS_WORKSPACE_SIZE (2 << 10)
 #define HUF_DECOMPRESS_WORKSPACE_SIZE_U32 (HUF_DECOMPRESS_WORKSPACE_SIZE / sizeof(U32))
 
+#ifndef HUF_FORCE_DECOMPRESS_X2
 size_t HUF_readDTableX1 (HUF_DTable* DTable, const void* src, size_t srcSize);
 size_t HUF_readDTableX1_wksp (HUF_DTable* DTable, const void* src, size_t srcSize, void* workSpace, size_t wkspSize);
+#endif
+#ifndef HUF_FORCE_DECOMPRESS_X1
 size_t HUF_readDTableX2 (HUF_DTable* DTable, const void* src, size_t srcSize);
 size_t HUF_readDTableX2_wksp (HUF_DTable* DTable, const void* src, size_t srcSize, void* workSpace, size_t wkspSize);
+#endif
 
 size_t HUF_decompress4X_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable);
+#ifndef HUF_FORCE_DECOMPRESS_X2
 size_t HUF_decompress4X1_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable);
+#endif
+#ifndef HUF_FORCE_DECOMPRESS_X1
 size_t HUF_decompress4X2_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable);
+#endif
 
 
 /* ====================== */
@@ -306,24 +318,36 @@
                        HUF_CElt* hufTable, HUF_repeat* repeat, int preferRepeat, int bmi2);
 
 size_t HUF_decompress1X1 (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize);   /* single-symbol decoder */
+#ifndef HUF_FORCE_DECOMPRESS_X1
 size_t HUF_decompress1X2 (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize);   /* double-symbol decoder */
+#endif
 
 size_t HUF_decompress1X_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize);
 size_t HUF_decompress1X_DCtx_wksp (HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize);
+#ifndef HUF_FORCE_DECOMPRESS_X2
 size_t HUF_decompress1X1_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize);   /**< single-symbol decoder */
 size_t HUF_decompress1X1_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize);   /**< single-symbol decoder */
+#endif
+#ifndef HUF_FORCE_DECOMPRESS_X1
 size_t HUF_decompress1X2_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize);   /**< double-symbols decoder */
 size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize);   /**< double-symbols decoder */
+#endif
 
 size_t HUF_decompress1X_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable);   /**< automatic selection of sing or double symbol decoder, based on DTable */
+#ifndef HUF_FORCE_DECOMPRESS_X2
 size_t HUF_decompress1X1_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable);
+#endif
+#ifndef HUF_FORCE_DECOMPRESS_X1
 size_t HUF_decompress1X2_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable);
+#endif
 
 /* BMI2 variants.
  * If the CPU has BMI2 support, pass bmi2=1, otherwise pass bmi2=0.
  */
 size_t HUF_decompress1X_usingDTable_bmi2(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2);
+#ifndef HUF_FORCE_DECOMPRESS_X2
 size_t HUF_decompress1X1_DCtx_wksp_bmi2(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, int bmi2);
+#endif
 size_t HUF_decompress4X_usingDTable_bmi2(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2);
 size_t HUF_decompress4X_hufOnly_wksp_bmi2(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, int bmi2);
 
diff --git a/contrib/python-zstandard/zstd/common/mem.h b/contrib/python-zstandard/zstd/common/mem.h
--- a/contrib/python-zstandard/zstd/common/mem.h
+++ b/contrib/python-zstandard/zstd/common/mem.h
@@ -39,6 +39,10 @@
 #  define MEM_STATIC static  /* this version may generate warnings for unused static functions; disable the relevant warning */
 #endif
 
+#ifndef __has_builtin
+#  define __has_builtin(x) 0  /* compat. with non-clang compilers */
+#endif
+
 /* code only tested on 32 and 64 bits systems */
 #define MEM_STATIC_ASSERT(c)   { enum { MEM_static_assert = 1/(int)(!!(c)) }; }
 MEM_STATIC void MEM_check(void) { MEM_STATIC_ASSERT((sizeof(size_t)==4) || (sizeof(size_t)==8)); }
@@ -198,7 +202,8 @@
 {
 #if defined(_MSC_VER)     /* Visual Studio */
     return _byteswap_ulong(in);
-#elif defined (__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__ >= 403)
+#elif (defined (__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__ >= 403)) \
+  || (defined(__clang__) && __has_builtin(__builtin_bswap32))
     return __builtin_bswap32(in);
 #else
     return  ((in << 24) & 0xff000000 ) |
@@ -212,7 +217,8 @@
 {
 #if defined(_MSC_VER)     /* Visual Studio */
     return _byteswap_uint64(in);
-#elif defined (__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__ >= 403)
+#elif (defined (__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__ >= 403)) \
+  || (defined(__clang__) && __has_builtin(__builtin_bswap64))
     return __builtin_bswap64(in);
 #else
     return  ((in << 56) & 0xff00000000000000ULL) |
diff --git a/contrib/python-zstandard/zstd/common/pool.c b/contrib/python-zstandard/zstd/common/pool.c
--- a/contrib/python-zstandard/zstd/common/pool.c
+++ b/contrib/python-zstandard/zstd/common/pool.c
@@ -88,8 +88,8 @@
             ctx->numThreadsBusy++;
             ctx->queueEmpty = ctx->queueHead == ctx->queueTail;
             /* Unlock the mutex, signal a pusher, and run the job */
+            ZSTD_pthread_cond_signal(&ctx->queuePushCond);
             ZSTD_pthread_mutex_unlock(&ctx->queueMutex);
-            ZSTD_pthread_cond_signal(&ctx->queuePushCond);
 
             job.function(job.opaque);
 
diff --git a/contrib/python-zstandard/zstd/common/zstd_common.c b/contrib/python-zstandard/zstd/common/zstd_common.c
--- a/contrib/python-zstandard/zstd/common/zstd_common.c
+++ b/contrib/python-zstandard/zstd/common/zstd_common.c
@@ -30,8 +30,10 @@
 /*-****************************************
 *  ZSTD Error Management
 ******************************************/
+#undef ZSTD_isError   /* defined within zstd_internal.h */
 /*! ZSTD_isError() :
- *  tells if a return value is an error code */
+ *  tells if a return value is an error code
+ *  symbol is required for external callers */
 unsigned ZSTD_isError(size_t code) { return ERR_isError(code); }
 
 /*! ZSTD_getErrorName() :
diff --git a/contrib/python-zstandard/zstd/common/zstd_errors.h b/contrib/python-zstandard/zstd/common/zstd_errors.h
--- a/contrib/python-zstandard/zstd/common/zstd_errors.h
+++ b/contrib/python-zstandard/zstd/common/zstd_errors.h
@@ -72,6 +72,7 @@
   ZSTD_error_workSpace_tooSmall= 66,
   ZSTD_error_dstSize_tooSmall = 70,
   ZSTD_error_srcSize_wrong    = 72,
+  ZSTD_error_dstBuffer_null   = 74,
   /* following error codes are __NOT STABLE__, they can be removed or changed in future versions */
   ZSTD_error_frameIndex_tooLarge = 100,
   ZSTD_error_seekableIO          = 102,
diff --git a/contrib/python-zstandard/zstd/common/zstd_internal.h b/contrib/python-zstandard/zstd/common/zstd_internal.h
--- a/contrib/python-zstandard/zstd/common/zstd_internal.h
+++ b/contrib/python-zstandard/zstd/common/zstd_internal.h
@@ -41,6 +41,9 @@
 
 /* ---- static assert (debug) --- */
 #define ZSTD_STATIC_ASSERT(c) DEBUG_STATIC_ASSERT(c)
+#define ZSTD_isError ERR_isError   /* for inlining */
+#define FSE_isError  ERR_isError
+#define HUF_isError  ERR_isError
 
 
 /*-*************************************
@@ -75,7 +78,6 @@
 #define BIT0   1
 
 #define ZSTD_WINDOWLOG_ABSOLUTEMIN 10
-#define ZSTD_WINDOWLOG_DEFAULTMAX 27 /* Default maximum allowed window log */
 static const size_t ZSTD_fcs_fieldSize[4] = { 0, 2, 4, 8 };
 static const size_t ZSTD_did_fieldSize[4] = { 0, 1, 2, 4 };
 
@@ -242,7 +244,7 @@
     blockType_e blockType;
     U32 lastBlock;
     U32 origSize;
-} blockProperties_t;
+} blockProperties_t;   /* declared here for decompress and fullbench */
 
 /*! ZSTD_getcBlockSize() :
  *  Provides the size of compressed block from block header `src` */
@@ -250,6 +252,13 @@
 size_t ZSTD_getcBlockSize(const void* src, size_t srcSize,
                           blockProperties_t* bpPtr);
 
+/*! ZSTD_decodeSeqHeaders() :
+ *  decode sequence header from src */
+/* Used by: decompress, fullbench (does not get its definition from here) */
+size_t ZSTD_decodeSeqHeaders(ZSTD_DCtx* dctx, int* nbSeqPtr,
+                       const void* src, size_t srcSize);
+
+
 #if defined (__cplusplus)
 }
 #endif
diff --git a/contrib/python-zstandard/zstd/compress/fse_compress.c b/contrib/python-zstandard/zstd/compress/fse_compress.c
--- a/contrib/python-zstandard/zstd/compress/fse_compress.c
+++ b/contrib/python-zstandard/zstd/compress/fse_compress.c
@@ -115,7 +115,7 @@
     /* symbol start positions */
     {   U32 u;
         cumul[0] = 0;
-        for (u=1; u<=maxSymbolValue+1; u++) {
+        for (u=1; u <= maxSymbolValue+1; u++) {
             if (normalizedCounter[u-1]==-1) {  /* Low proba symbol */
                 cumul[u] = cumul[u-1] + 1;
                 tableSymbol[highThreshold--] = (FSE_FUNCTION_TYPE)(u-1);
@@ -658,7 +658,7 @@
     BYTE* op = ostart;
     BYTE* const oend = ostart + dstSize;
 
-    U32   count[FSE_MAX_SYMBOL_VALUE+1];
+    unsigned count[FSE_MAX_SYMBOL_VALUE+1];
     S16   norm[FSE_MAX_SYMBOL_VALUE+1];
     FSE_CTable* CTable = (FSE_CTable*)workSpace;
     size_t const CTableSize = FSE_CTABLE_SIZE_U32(tableLog, maxSymbolValue);
@@ -672,7 +672,7 @@
     if (!tableLog) tableLog = FSE_DEFAULT_TABLELOG;
 
     /* Scan input and build symbol stats */
-    {   CHECK_V_F(maxCount, HIST_count_wksp(count, &maxSymbolValue, src, srcSize, (unsigned*)scratchBuffer) );
+    {   CHECK_V_F(maxCount, HIST_count_wksp(count, &maxSymbolValue, src, srcSize, scratchBuffer, scratchBufferSize) );
         if (maxCount == srcSize) return 1;   /* only a single symbol in src : rle */
         if (maxCount == 1) return 0;         /* each symbol present maximum once => not compressible */
         if (maxCount < (srcSize >> 7)) return 0;   /* Heuristic : not compressible enough */
diff --git a/contrib/python-zstandard/zstd/compress/hist.h b/contrib/python-zstandard/zstd/compress/hist.h
--- a/contrib/python-zstandard/zstd/compress/hist.h
+++ b/contrib/python-zstandard/zstd/compress/hist.h
@@ -41,11 +41,11 @@
 
 /*! HIST_count():
  *  Provides the precise count of each byte within a table 'count'.
- *  'count' is a table of unsigned int, of minimum size (*maxSymbolValuePtr+1).
+ * 'count' is a table of unsigned int, of minimum size (*maxSymbolValuePtr+1).
  *  Updates *maxSymbolValuePtr with actual largest symbol value detected.
- *  @return : count of the most frequent symbol (which isn't identified).
- *            or an error code, which can be tested using HIST_isError().
- *            note : if return == srcSize, there is only one symbol.
+ * @return : count of the most frequent symbol (which isn't identified).
+ *           or an error code, which can be tested using HIST_isError().
+ *           note : if return == srcSize, there is only one symbol.
  */
 size_t HIST_count(unsigned* count, unsigned* maxSymbolValuePtr,
                   const void* src, size_t srcSize);
@@ -56,14 +56,16 @@
 /* --- advanced histogram functions --- */
 
 #define HIST_WKSP_SIZE_U32 1024
+#define HIST_WKSP_SIZE    (HIST_WKSP_SIZE_U32 * sizeof(unsigned))
 /** HIST_count_wksp() :
  *  Same as HIST_count(), but using an externally provided scratch buffer.
  *  Benefit is this function will use very little stack space.
- * `workSpace` must be a table of unsigned of size >= HIST_WKSP_SIZE_U32
+ * `workSpace` is a writable buffer which must be 4-bytes aligned,
+ * `workSpaceSize` must be >= HIST_WKSP_SIZE
  */
 size_t HIST_count_wksp(unsigned* count, unsigned* maxSymbolValuePtr,
                        const void* src, size_t srcSize,
-                       unsigned* workSpace);
+                       void* workSpace, size_t workSpaceSize);
 
 /** HIST_countFast() :
  *  same as HIST_count(), but blindly trusts that all byte values within src are <= *maxSymbolValuePtr.
@@ -74,11 +76,12 @@
 
 /** HIST_countFast_wksp() :
  *  Same as HIST_countFast(), but using an externally provided scratch buffer.
- * `workSpace` must be a table of unsigned of size >= HIST_WKSP_SIZE_U32
+ * `workSpace` is a writable buffer which must be 4-bytes aligned,
+ * `workSpaceSize` must be >= HIST_WKSP_SIZE
  */
 size_t HIST_countFast_wksp(unsigned* count, unsigned* maxSymbolValuePtr,
                            const void* src, size_t srcSize,
-                           unsigned* workSpace);
+                           void* workSpace, size_t workSpaceSize);
 
 /*! HIST_count_simple() :
  *  Same as HIST_countFast(), this function is unsafe,
diff --git a/contrib/python-zstandard/zstd/compress/hist.c b/contrib/python-zstandard/zstd/compress/hist.c
--- a/contrib/python-zstandard/zstd/compress/hist.c
+++ b/contrib/python-zstandard/zstd/compress/hist.c
@@ -73,6 +73,7 @@
     return largestCount;
 }
 
+typedef enum { trustInput, checkMaxSymbolValue } HIST_checkInput_e;
 
 /* HIST_count_parallel_wksp() :
  * store histogram into 4 intermediate tables, recombined at the end.
@@ -85,8 +86,8 @@
 static size_t HIST_count_parallel_wksp(
                                 unsigned* count, unsigned* maxSymbolValuePtr,
                                 const void* source, size_t sourceSize,
-                                unsigned checkMax,
-                                unsigned* const workSpace)
+                                HIST_checkInput_e check,
+                                U32* const workSpace)
 {
     const BYTE* ip = (const BYTE*)source;
     const BYTE* const iend = ip+sourceSize;
@@ -137,7 +138,7 @@
     /* finish last symbols */
     while (ip<iend) Counting1[*ip++]++;
 
-    if (checkMax) {   /* verify stats will fit into destination table */
+    if (check) {   /* verify stats will fit into destination table */
         U32 s; for (s=255; s>maxSymbolValue; s--) {
             Counting1[s] += Counting2[s] + Counting3[s] + Counting4[s];
             if (Counting1[s]) return ERROR(maxSymbolValue_tooSmall);
@@ -157,14 +158,18 @@
 
 /* HIST_countFast_wksp() :
  * Same as HIST_countFast(), but using an externally provided scratch buffer.
- * `workSpace` size must be table of >= HIST_WKSP_SIZE_U32 unsigned */
+ * `workSpace` is a writable buffer which must be 4-bytes aligned,
+ * `workSpaceSize` must be >= HIST_WKSP_SIZE
+ */
 size_t HIST_countFast_wksp(unsigned* count, unsigned* maxSymbolValuePtr,
                           const void* source, size_t sourceSize,
-                          unsigned* workSpace)
+                          void* workSpace, size_t workSpaceSize)
 {
     if (sourceSize < 1500) /* heuristic threshold */
         return HIST_count_simple(count, maxSymbolValuePtr, source, sourceSize);
-    return HIST_count_parallel_wksp(count, maxSymbolValuePtr, source, sourceSize, 0, workSpace);
+    if ((size_t)workSpace & 3) return ERROR(GENERIC);  /* must be aligned on 4-bytes boundaries */
+    if (workSpaceSize < HIST_WKSP_SIZE) return ERROR(workSpace_tooSmall);
+    return HIST_count_parallel_wksp(count, maxSymbolValuePtr, source, sourceSize, trustInput, (U32*)workSpace);
 }
 
 /* fast variant (unsafe : won't check if src contains values beyond count[] limit) */
@@ -172,24 +177,27 @@
                      const void* source, size_t sourceSize)
 {
     unsigned tmpCounters[HIST_WKSP_SIZE_U32];
-    return HIST_countFast_wksp(count, maxSymbolValuePtr, source, sourceSize, tmpCounters);
+    return HIST_countFast_wksp(count, maxSymbolValuePtr, source, sourceSize, tmpCounters, sizeof(tmpCounters));
 }
 
 /* HIST_count_wksp() :
  * Same as HIST_count(), but using an externally provided scratch buffer.
  * `workSpace` size must be table of >= HIST_WKSP_SIZE_U32 unsigned */
 size_t HIST_count_wksp(unsigned* count, unsigned* maxSymbolValuePtr,
-                 const void* source, size_t sourceSize, unsigned* workSpace)
+                       const void* source, size_t sourceSize,
+                       void* workSpace, size_t workSpaceSize)
 {
+    if ((size_t)workSpace & 3) return ERROR(GENERIC);  /* must be aligned on 4-bytes boundaries */
+    if (workSpaceSize < HIST_WKSP_SIZE) return ERROR(workSpace_tooSmall);
     if (*maxSymbolValuePtr < 255)
-        return HIST_count_parallel_wksp(count, maxSymbolValuePtr, source, sourceSize, 1, workSpace);
+        return HIST_count_parallel_wksp(count, maxSymbolValuePtr, source, sourceSize, checkMaxSymbolValue, (U32*)workSpace);
     *maxSymbolValuePtr = 255;
-    return HIST_countFast_wksp(count, maxSymbolValuePtr, source, sourceSize, workSpace);
+    return HIST_countFast_wksp(count, maxSymbolValuePtr, source, sourceSize, workSpace, workSpaceSize);
 }
 
 size_t HIST_count(unsigned* count, unsigned* maxSymbolValuePtr,
                  const void* src, size_t srcSize)
 {
     unsigned tmpCounters[HIST_WKSP_SIZE_U32];
-    return HIST_count_wksp(count, maxSymbolValuePtr, src, srcSize, tmpCounters);
+    return HIST_count_wksp(count, maxSymbolValuePtr, src, srcSize, tmpCounters, sizeof(tmpCounters));
 }
diff --git a/contrib/python-zstandard/zstd/compress/huf_compress.c b/contrib/python-zstandard/zstd/compress/huf_compress.c
--- a/contrib/python-zstandard/zstd/compress/huf_compress.c
+++ b/contrib/python-zstandard/zstd/compress/huf_compress.c
@@ -88,13 +88,13 @@
     BYTE* op = ostart;
     BYTE* const oend = ostart + dstSize;
 
-    U32 maxSymbolValue = HUF_TABLELOG_MAX;
+    unsigned maxSymbolValue = HUF_TABLELOG_MAX;
     U32 tableLog = MAX_FSE_TABLELOG_FOR_HUFF_HEADER;
 
     FSE_CTable CTable[FSE_CTABLE_SIZE_U32(MAX_FSE_TABLELOG_FOR_HUFF_HEADER, HUF_TABLELOG_MAX)];
     BYTE scratchBuffer[1<<MAX_FSE_TABLELOG_FOR_HUFF_HEADER];
 
-    U32 count[HUF_TABLELOG_MAX+1];
+    unsigned count[HUF_TABLELOG_MAX+1];
     S16 norm[HUF_TABLELOG_MAX+1];
 
     /* init conditions */
@@ -134,7 +134,7 @@
     `CTable` : Huffman tree to save, using huf representation.
     @return : size of saved CTable */
 size_t HUF_writeCTable (void* dst, size_t maxDstSize,
-                        const HUF_CElt* CTable, U32 maxSymbolValue, U32 huffLog)
+                        const HUF_CElt* CTable, unsigned maxSymbolValue, unsigned huffLog)
 {
     BYTE bitsToWeight[HUF_TABLELOG_MAX + 1];   /* precomputed conversion table */
     BYTE huffWeight[HUF_SYMBOLVALUE_MAX];
@@ -169,7 +169,7 @@
 }
 
 
-size_t HUF_readCTable (HUF_CElt* CTable, U32* maxSymbolValuePtr, const void* src, size_t srcSize)
+size_t HUF_readCTable (HUF_CElt* CTable, unsigned* maxSymbolValuePtr, const void* src, size_t srcSize)
 {
     BYTE huffWeight[HUF_SYMBOLVALUE_MAX + 1];   /* init not required, even though some static analyzer may complain */
     U32 rankVal[HUF_TABLELOG_ABSOLUTEMAX + 1];   /* large enough for values from 0 to 16 */
@@ -315,7 +315,7 @@
     U32 current;
 } rankPos;
 
-static void HUF_sort(nodeElt* huffNode, const U32* count, U32 maxSymbolValue)
+static void HUF_sort(nodeElt* huffNode, const unsigned* count, U32 maxSymbolValue)
 {
     rankPos rank[32];
     U32 n;
@@ -347,7 +347,7 @@
  */
 #define STARTNODE (HUF_SYMBOLVALUE_MAX+1)
 typedef nodeElt huffNodeTable[HUF_CTABLE_WORKSPACE_SIZE_U32];
-size_t HUF_buildCTable_wksp (HUF_CElt* tree, const U32* count, U32 maxSymbolValue, U32 maxNbBits, void* workSpace, size_t wkspSize)
+size_t HUF_buildCTable_wksp (HUF_CElt* tree, const unsigned* count, U32 maxSymbolValue, U32 maxNbBits, void* workSpace, size_t wkspSize)
 {
     nodeElt* const huffNode0 = (nodeElt*)workSpace;
     nodeElt* const huffNode = huffNode0+1;
@@ -421,7 +421,7 @@
  * @return : maxNbBits
  *  Note : count is used before tree is written, so they can safely overlap
  */
-size_t HUF_buildCTable (HUF_CElt* tree, const U32* count, U32 maxSymbolValue, U32 maxNbBits)
+size_t HUF_buildCTable (HUF_CElt* tree, const unsigned* count, unsigned maxSymbolValue, unsigned maxNbBits)
 {
     huffNodeTable nodeTable;
     return HUF_buildCTable_wksp(tree, count, maxSymbolValue, maxNbBits, nodeTable, sizeof(nodeTable));
@@ -610,13 +610,14 @@
     return HUF_compress4X_usingCTable_internal(dst, dstSize, src, srcSize, CTable, /* bmi2 */ 0);
 }
 
+typedef enum { HUF_singleStream, HUF_fourStreams } HUF_nbStreams_e;
 
 static size_t HUF_compressCTable_internal(
                 BYTE* const ostart, BYTE* op, BYTE* const oend,
                 const void* src, size_t srcSize,
-                unsigned singleStream, const HUF_CElt* CTable, const int bmi2)
+                HUF_nbStreams_e nbStreams, const HUF_CElt* CTable, const int bmi2)
 {
-    size_t const cSize = singleStream ?
+    size_t const cSize = (nbStreams==HUF_singleStream) ?
                          HUF_compress1X_usingCTable_internal(op, oend - op, src, srcSize, CTable, bmi2) :
                          HUF_compress4X_usingCTable_internal(op, oend - op, src, srcSize, CTable, bmi2);
     if (HUF_isError(cSize)) { return cSize; }
@@ -628,21 +629,21 @@
 }
 
 typedef struct {
-    U32 count[HUF_SYMBOLVALUE_MAX + 1];
+    unsigned count[HUF_SYMBOLVALUE_MAX + 1];
     HUF_CElt CTable[HUF_SYMBOLVALUE_MAX + 1];
     huffNodeTable nodeTable;
 } HUF_compress_tables_t;
 
 /* HUF_compress_internal() :
  * `workSpace` must a table of at least HUF_WORKSPACE_SIZE_U32 unsigned */
-static size_t HUF_compress_internal (
-                void* dst, size_t dstSize,
-                const void* src, size_t srcSize,
-                unsigned maxSymbolValue, unsigned huffLog,
-                unsigned singleStream,
-                void* workSpace, size_t wkspSize,
-                HUF_CElt* oldHufTable, HUF_repeat* repeat, int preferRepeat,
-                const int bmi2)
+static size_t
+HUF_compress_internal (void* dst, size_t dstSize,
+                 const void* src, size_t srcSize,
+                       unsigned maxSymbolValue, unsigned huffLog,
+                       HUF_nbStreams_e nbStreams,
+                       void* workSpace, size_t wkspSize,
+                       HUF_CElt* oldHufTable, HUF_repeat* repeat, int preferRepeat,
+                 const int bmi2)
 {
     HUF_compress_tables_t* const table = (HUF_compress_tables_t*)workSpace;
     BYTE* const ostart = (BYTE*)dst;
@@ -651,7 +652,7 @@
 
     /* checks & inits */
     if (((size_t)workSpace & 3) != 0) return ERROR(GENERIC);  /* must be aligned on 4-bytes boundaries */
-    if (wkspSize < sizeof(*table)) return ERROR(workSpace_tooSmall);
+    if (wkspSize < HUF_WORKSPACE_SIZE) return ERROR(workSpace_tooSmall);
     if (!srcSize) return 0;  /* Uncompressed */
     if (!dstSize) return 0;  /* cannot fit anything within dst budget */
     if (srcSize > HUF_BLOCKSIZE_MAX) return ERROR(srcSize_wrong);   /* current block size limit */
@@ -664,11 +665,11 @@
     if (preferRepeat && repeat && *repeat == HUF_repeat_valid) {
         return HUF_compressCTable_internal(ostart, op, oend,
                                            src, srcSize,
-                                           singleStream, oldHufTable, bmi2);
+                                           nbStreams, oldHufTable, bmi2);
     }
 
     /* Scan input and build symbol stats */
-    {   CHECK_V_F(largest, HIST_count_wksp (table->count, &maxSymbolValue, (const BYTE*)src, srcSize, table->count) );
+    {   CHECK_V_F(largest, HIST_count_wksp (table->count, &maxSymbolValue, (const BYTE*)src, srcSize, workSpace, wkspSize) );
         if (largest == srcSize) { *ostart = ((const BYTE*)src)[0]; return 1; }   /* single symbol, rle */
         if (largest <= (srcSize >> 7)+4) return 0;   /* heuristic : probably not compressible enough */
     }
@@ -683,14 +684,15 @@
     if (preferRepeat && repeat && *repeat != HUF_repeat_none) {
         return HUF_compressCTable_internal(ostart, op, oend,
                                            src, srcSize,
-                                           singleStream, oldHufTable, bmi2);
+                                           nbStreams, oldHufTable, bmi2);
     }
 
     /* Build Huffman Tree */
     huffLog = HUF_optimalTableLog(huffLog, srcSize, maxSymbolValue);
-    {   CHECK_V_F(maxBits, HUF_buildCTable_wksp(table->CTable, table->count,
-                                                maxSymbolValue, huffLog,
-                                                table->nodeTable, sizeof(table->nodeTable)) );
+    {   size_t const maxBits = HUF_buildCTable_wksp(table->CTable, table->count,
+                                            maxSymbolValue, huffLog,
+                                            table->nodeTable, sizeof(table->nodeTable));
+        CHECK_F(maxBits);
         huffLog = (U32)maxBits;
         /* Zero unused symbols in CTable, so we can check it for validity */
         memset(table->CTable + (maxSymbolValue + 1), 0,
@@ -706,7 +708,7 @@
             if (oldSize <= hSize + newSize || hSize + 12 >= srcSize) {
                 return HUF_compressCTable_internal(ostart, op, oend,
                                                    src, srcSize,
-                                                   singleStream, oldHufTable, bmi2);
+                                                   nbStreams, oldHufTable, bmi2);
         }   }
 
         /* Use the new huffman table */
@@ -718,7 +720,7 @@
     }
     return HUF_compressCTable_internal(ostart, op, oend,
                                        src, srcSize,
-                                       singleStream, table->CTable, bmi2);
+                                       nbStreams, table->CTable, bmi2);
 }
 
 
@@ -728,7 +730,7 @@
                       void* workSpace, size_t wkspSize)
 {
     return HUF_compress_internal(dst, dstSize, src, srcSize,
-                                 maxSymbolValue, huffLog, 1 /*single stream*/,
+                                 maxSymbolValue, huffLog, HUF_singleStream,
                                  workSpace, wkspSize,
                                  NULL, NULL, 0, 0 /*bmi2*/);
 }
@@ -740,7 +742,7 @@
                       HUF_CElt* hufTable, HUF_repeat* repeat, int preferRepeat, int bmi2)
 {
     return HUF_compress_internal(dst, dstSize, src, srcSize,
-                                 maxSymbolValue, huffLog, 1 /*single stream*/,
+                                 maxSymbolValue, huffLog, HUF_singleStream,
                                  workSpace, wkspSize, hufTable,
                                  repeat, preferRepeat, bmi2);
 }
@@ -762,7 +764,7 @@
                       void* workSpace, size_t wkspSize)
 {
     return HUF_compress_internal(dst, dstSize, src, srcSize,
-                                 maxSymbolValue, huffLog, 0 /*4 streams*/,
+                                 maxSymbolValue, huffLog, HUF_fourStreams,
                                  workSpace, wkspSize,
                                  NULL, NULL, 0, 0 /*bmi2*/);
 }
@@ -777,7 +779,7 @@
                       HUF_CElt* hufTable, HUF_repeat* repeat, int preferRepeat, int bmi2)
 {
     return HUF_compress_internal(dst, dstSize, src, srcSize,
-                                 maxSymbolValue, huffLog, 0 /* 4 streams */,
+                                 maxSymbolValue, huffLog, HUF_fourStreams,
                                  workSpace, wkspSize,
                                  hufTable, repeat, preferRepeat, bmi2);
 }
diff --git a/contrib/python-zstandard/zstd/compress/zstd_compress.c b/contrib/python-zstandard/zstd/compress/zstd_compress.c
--- a/contrib/python-zstandard/zstd/compress/zstd_compress.c
+++ b/contrib/python-zstandard/zstd/compress/zstd_compress.c
@@ -11,6 +11,7 @@
 /*-*************************************
 *  Dependencies
 ***************************************/
+#include <limits.h>         /* INT_MAX */
 #include <string.h>         /* memset */
 #include "cpu.h"
 #include "mem.h"
@@ -61,7 +62,7 @@
     memset(cctx, 0, sizeof(*cctx));
     cctx->customMem = memManager;
     cctx->bmi2 = ZSTD_cpuid_bmi2(ZSTD_cpuid());
-    {   size_t const err = ZSTD_CCtx_resetParameters(cctx);
+    {   size_t const err = ZSTD_CCtx_reset(cctx, ZSTD_reset_parameters);
         assert(!ZSTD_isError(err));
         (void)err;
     }
@@ -128,7 +129,7 @@
 #ifdef ZSTD_MULTITHREAD
     return ZSTDMT_sizeof_CCtx(cctx->mtctx);
 #else
-    (void) cctx;
+    (void)cctx;
     return 0;
 #endif
 }
@@ -226,9 +227,160 @@
     return ret;
 }
 
-#define CLAMPCHECK(val,min,max) {            \
-    if (((val)<(min)) | ((val)>(max))) {     \
-        return ERROR(parameter_outOfBound);  \
+ZSTD_bounds ZSTD_cParam_getBounds(ZSTD_cParameter param)
+{
+    ZSTD_bounds bounds = { 0, 0, 0 };
+
+    switch(param)
+    {
+    case ZSTD_c_compressionLevel:
+        bounds.lowerBound = ZSTD_minCLevel();
+        bounds.upperBound = ZSTD_maxCLevel();
+        return bounds;
+
+    case ZSTD_c_windowLog:
+        bounds.lowerBound = ZSTD_WINDOWLOG_MIN;
+        bounds.upperBound = ZSTD_WINDOWLOG_MAX;
+        return bounds;
+
+    case ZSTD_c_hashLog:
+        bounds.lowerBound = ZSTD_HASHLOG_MIN;
+        bounds.upperBound = ZSTD_HASHLOG_MAX;
+        return bounds;
+
+    case ZSTD_c_chainLog:
+        bounds.lowerBound = ZSTD_CHAINLOG_MIN;
+        bounds.upperBound = ZSTD_CHAINLOG_MAX;
+        return bounds;
+
+    case ZSTD_c_searchLog:
+        bounds.lowerBound = ZSTD_SEARCHLOG_MIN;
+        bounds.upperBound = ZSTD_SEARCHLOG_MAX;
+        return bounds;
+
+    case ZSTD_c_minMatch:
+        bounds.lowerBound = ZSTD_MINMATCH_MIN;
+        bounds.upperBound = ZSTD_MINMATCH_MAX;
+        return bounds;
+
+    case ZSTD_c_targetLength:
+        bounds.lowerBound = ZSTD_TARGETLENGTH_MIN;
+        bounds.upperBound = ZSTD_TARGETLENGTH_MAX;
+        return bounds;
+
+    case ZSTD_c_strategy:
+        bounds.lowerBound = ZSTD_STRATEGY_MIN;
+        bounds.upperBound = ZSTD_STRATEGY_MAX;
+        return bounds;
+
+    case ZSTD_c_contentSizeFlag:
+        bounds.lowerBound = 0;
+        bounds.upperBound = 1;
+        return bounds;
+
+    case ZSTD_c_checksumFlag:
+        bounds.lowerBound = 0;
+        bounds.upperBound = 1;
+        return bounds;
+
+    case ZSTD_c_dictIDFlag:
+        bounds.lowerBound = 0;
+        bounds.upperBound = 1;
+        return bounds;
+
+    case ZSTD_c_nbWorkers:
+        bounds.lowerBound = 0;
+#ifdef ZSTD_MULTITHREAD
+        bounds.upperBound = ZSTDMT_NBWORKERS_MAX;
+#else
+        bounds.upperBound = 0;
+#endif
+        return bounds;
+
+    case ZSTD_c_jobSize:
+        bounds.lowerBound = 0;
+#ifdef ZSTD_MULTITHREAD
+        bounds.upperBound = ZSTDMT_JOBSIZE_MAX;
+#else
+        bounds.upperBound = 0;
+#endif
+        return bounds;
+
+    case ZSTD_c_overlapLog:
+        bounds.lowerBound = ZSTD_OVERLAPLOG_MIN;
+        bounds.upperBound = ZSTD_OVERLAPLOG_MAX;
+        return bounds;
+
+    case ZSTD_c_enableLongDistanceMatching:
+        bounds.lowerBound = 0;
+        bounds.upperBound = 1;
+        return bounds;
+
+    case ZSTD_c_ldmHashLog:
+        bounds.lowerBound = ZSTD_LDM_HASHLOG_MIN;
+        bounds.upperBound = ZSTD_LDM_HASHLOG_MAX;
+        return bounds;
+
+    case ZSTD_c_ldmMinMatch:
+        bounds.lowerBound = ZSTD_LDM_MINMATCH_MIN;
+        bounds.upperBound = ZSTD_LDM_MINMATCH_MAX;
+        return bounds;
+
+    case ZSTD_c_ldmBucketSizeLog:
+        bounds.lowerBound = ZSTD_LDM_BUCKETSIZELOG_MIN;
+        bounds.upperBound = ZSTD_LDM_BUCKETSIZELOG_MAX;
+        return bounds;
+
+    case ZSTD_c_ldmHashRateLog:
+        bounds.lowerBound = ZSTD_LDM_HASHRATELOG_MIN;
+        bounds.upperBound = ZSTD_LDM_HASHRATELOG_MAX;
+        return bounds;
+
+    /* experimental parameters */
+    case ZSTD_c_rsyncable:
+        bounds.lowerBound = 0;
+        bounds.upperBound = 1;
+        return bounds;
+
+    case ZSTD_c_forceMaxWindow :
+        bounds.lowerBound = 0;
+        bounds.upperBound = 1;
+        return bounds;
+
+    case ZSTD_c_format:
+        ZSTD_STATIC_ASSERT(ZSTD_f_zstd1 < ZSTD_f_zstd1_magicless);
+        bounds.lowerBound = ZSTD_f_zstd1;
+        bounds.upperBound = ZSTD_f_zstd1_magicless;   /* note : how to ensure at compile time that this is the highest value enum ? */
+        return bounds;
+
+    case ZSTD_c_forceAttachDict:
+        ZSTD_STATIC_ASSERT(ZSTD_dictDefaultAttach < ZSTD_dictForceCopy);
+        bounds.lowerBound = ZSTD_dictDefaultAttach;
+        bounds.upperBound = ZSTD_dictForceCopy;       /* note : how to ensure at compile time that this is the highest value enum ? */
+        return bounds;
+
+    default:
+        {   ZSTD_bounds const boundError = { ERROR(parameter_unsupported), 0, 0 };
+            return boundError;
+        }
+    }
+}
+
+/* ZSTD_cParam_withinBounds:
+ * @return 1 if value is within cParam bounds,
+ * 0 otherwise */
+static int ZSTD_cParam_withinBounds(ZSTD_cParameter cParam, int value)
+{
+    ZSTD_bounds const bounds = ZSTD_cParam_getBounds(cParam);
+    if (ZSTD_isError(bounds.error)) return 0;
+    if (value < bounds.lowerBound) return 0;
+    if (value > bounds.upperBound) return 0;
+    return 1;
+}
+
+#define BOUNDCHECK(cParam, val) {                  \
+    if (!ZSTD_cParam_withinBounds(cParam,val)) {   \
+        return ERROR(parameter_outOfBound);        \
 }   }
 
 
@@ -236,38 +388,39 @@
 {
     switch(param)
     {
-    case ZSTD_p_compressionLevel:
-    case ZSTD_p_hashLog:
-    case ZSTD_p_chainLog:
-    case ZSTD_p_searchLog:
-    case ZSTD_p_minMatch:
-    case ZSTD_p_targetLength:
-    case ZSTD_p_compressionStrategy:
+    case ZSTD_c_compressionLevel:
+    case ZSTD_c_hashLog:
+    case ZSTD_c_chainLog:
+    case ZSTD_c_searchLog:
+    case ZSTD_c_minMatch:
+    case ZSTD_c_targetLength:
+    case ZSTD_c_strategy:
         return 1;
 
-    case ZSTD_p_format:
-    case ZSTD_p_windowLog:
-    case ZSTD_p_contentSizeFlag:
-    case ZSTD_p_checksumFlag:
-    case ZSTD_p_dictIDFlag:
-    case ZSTD_p_forceMaxWindow :
-    case ZSTD_p_nbWorkers:
-    case ZSTD_p_jobSize:
-    case ZSTD_p_overlapSizeLog:
-    case ZSTD_p_enableLongDistanceMatching:
-    case ZSTD_p_ldmHashLog:
-    case ZSTD_p_ldmMinMatch:
-    case ZSTD_p_ldmBucketSizeLog:
-    case ZSTD_p_ldmHashEveryLog:
-    case ZSTD_p_forceAttachDict:
+    case ZSTD_c_format:
+    case ZSTD_c_windowLog:
+    case ZSTD_c_contentSizeFlag:
+    case ZSTD_c_checksumFlag:
+    case ZSTD_c_dictIDFlag:
+    case ZSTD_c_forceMaxWindow :
+    case ZSTD_c_nbWorkers:
+    case ZSTD_c_jobSize:
+    case ZSTD_c_overlapLog:
+    case ZSTD_c_rsyncable:
+    case ZSTD_c_enableLongDistanceMatching:
+    case ZSTD_c_ldmHashLog:
+    case ZSTD_c_ldmMinMatch:
+    case ZSTD_c_ldmBucketSizeLog:
+    case ZSTD_c_ldmHashRateLog:
+    case ZSTD_c_forceAttachDict:
     default:
         return 0;
     }
 }
 
-size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, unsigned value)
+size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, int value)
 {
-    DEBUGLOG(4, "ZSTD_CCtx_setParameter (%u, %u)", (U32)param, value);
+    DEBUGLOG(4, "ZSTD_CCtx_setParameter (%i, %i)", (int)param, value);
     if (cctx->streamStage != zcss_init) {
         if (ZSTD_isUpdateAuthorized(param)) {
             cctx->cParamsChanged = 1;
@@ -277,51 +430,52 @@
 
     switch(param)
     {
-    case ZSTD_p_format :
+    case ZSTD_c_format :
         return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
 
-    case ZSTD_p_compressionLevel:
+    case ZSTD_c_compressionLevel:
         if (cctx->cdict) return ERROR(stage_wrong);
         return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
 
-    case ZSTD_p_windowLog:
-    case ZSTD_p_hashLog:
-    case ZSTD_p_chainLog:
-    case ZSTD_p_searchLog:
-    case ZSTD_p_minMatch:
-    case ZSTD_p_targetLength:
-    case ZSTD_p_compressionStrategy:
+    case ZSTD_c_windowLog:
+    case ZSTD_c_hashLog:
+    case ZSTD_c_chainLog:
+    case ZSTD_c_searchLog:
+    case ZSTD_c_minMatch:
+    case ZSTD_c_targetLength:
+    case ZSTD_c_strategy:
         if (cctx->cdict) return ERROR(stage_wrong);
         return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
 
-    case ZSTD_p_contentSizeFlag:
-    case ZSTD_p_checksumFlag:
-    case ZSTD_p_dictIDFlag:
+    case ZSTD_c_contentSizeFlag:
+    case ZSTD_c_checksumFlag:
+    case ZSTD_c_dictIDFlag:
         return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
 
-    case ZSTD_p_forceMaxWindow :  /* Force back-references to remain < windowSize,
+    case ZSTD_c_forceMaxWindow :  /* Force back-references to remain < windowSize,
                                    * even when referencing into Dictionary content.
                                    * default : 0 when using a CDict, 1 when using a Prefix */
         return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
 
-    case ZSTD_p_forceAttachDict:
+    case ZSTD_c_forceAttachDict:
         return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
 
-    case ZSTD_p_nbWorkers:
-        if ((value>0) && cctx->staticSize) {
+    case ZSTD_c_nbWorkers:
+        if ((value!=0) && cctx->staticSize) {
             return ERROR(parameter_unsupported);  /* MT not compatible with static alloc */
         }
         return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
 
-    case ZSTD_p_jobSize:
-    case ZSTD_p_overlapSizeLog:
+    case ZSTD_c_jobSize:
+    case ZSTD_c_overlapLog:
+    case ZSTD_c_rsyncable:
         return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
 
-    case ZSTD_p_enableLongDistanceMatching:
-    case ZSTD_p_ldmHashLog:
-    case ZSTD_p_ldmMinMatch:
-    case ZSTD_p_ldmBucketSizeLog:
-    case ZSTD_p_ldmHashEveryLog:
+    case ZSTD_c_enableLongDistanceMatching:
+    case ZSTD_c_ldmHashLog:
+    case ZSTD_c_ldmMinMatch:
+    case ZSTD_c_ldmBucketSizeLog:
+    case ZSTD_c_ldmHashRateLog:
         if (cctx->cdict) return ERROR(stage_wrong);
         return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
 
@@ -329,21 +483,21 @@
     }
 }
 
-size_t ZSTD_CCtxParam_setParameter(
-        ZSTD_CCtx_params* CCtxParams, ZSTD_cParameter param, unsigned value)
+size_t ZSTD_CCtxParam_setParameter(ZSTD_CCtx_params* CCtxParams,
+                                   ZSTD_cParameter param, int value)
 {
-    DEBUGLOG(4, "ZSTD_CCtxParam_setParameter (%u, %u)", (U32)param, value);
+    DEBUGLOG(4, "ZSTD_CCtxParam_setParameter (%i, %i)", (int)param, value);
     switch(param)
     {
-    case ZSTD_p_format :
-        if (value > (unsigned)ZSTD_f_zstd1_magicless)
-            return ERROR(parameter_unsupported);
+    case ZSTD_c_format :
+        BOUNDCHECK(ZSTD_c_format, value);
         CCtxParams->format = (ZSTD_format_e)value;
         return (size_t)CCtxParams->format;
 
-    case ZSTD_p_compressionLevel : {
-        int cLevel = (int)value;  /* cast expected to restore negative sign */
+    case ZSTD_c_compressionLevel : {
+        int cLevel = value;
         if (cLevel > ZSTD_maxCLevel()) cLevel = ZSTD_maxCLevel();
+        if (cLevel < ZSTD_minCLevel()) cLevel = ZSTD_minCLevel();
         if (cLevel) {  /* 0 : does not change current level */
             CCtxParams->compressionLevel = cLevel;
         }
@@ -351,213 +505,229 @@
         return 0;  /* return type (size_t) cannot represent negative values */
     }
 
-    case ZSTD_p_windowLog :
-        if (value>0)   /* 0 => use default */
-            CLAMPCHECK(value, ZSTD_WINDOWLOG_MIN, ZSTD_WINDOWLOG_MAX);
+    case ZSTD_c_windowLog :
+        if (value!=0)   /* 0 => use default */
+            BOUNDCHECK(ZSTD_c_windowLog, value);
         CCtxParams->cParams.windowLog = value;
         return CCtxParams->cParams.windowLog;
 
-    case ZSTD_p_hashLog :
-        if (value>0)   /* 0 => use default */
-            CLAMPCHECK(value, ZSTD_HASHLOG_MIN, ZSTD_HASHLOG_MAX);
+    case ZSTD_c_hashLog :
+        if (value!=0)   /* 0 => use default */
+            BOUNDCHECK(ZSTD_c_hashLog, value);
         CCtxParams->cParams.hashLog = value;
         return CCtxParams->cParams.hashLog;
 
-    case ZSTD_p_chainLog :
-        if (value>0)   /* 0 => use default */
-            CLAMPCHECK(value, ZSTD_CHAINLOG_MIN, ZSTD_CHAINLOG_MAX);
+    case ZSTD_c_chainLog :
+        if (value!=0)   /* 0 => use default */
+            BOUNDCHECK(ZSTD_c_chainLog, value);
         CCtxParams->cParams.chainLog = value;
         return CCtxParams->cParams.chainLog;
 
-    case ZSTD_p_searchLog :
-        if (value>0)   /* 0 => use default */
-            CLAMPCHECK(value, ZSTD_SEARCHLOG_MIN, ZSTD_SEARCHLOG_MAX);
+    case ZSTD_c_searchLog :
+        if (value!=0)   /* 0 => use default */
+            BOUNDCHECK(ZSTD_c_searchLog, value);
         CCtxParams->cParams.searchLog = value;
         return value;
 
-    case ZSTD_p_minMatch :
-        if (value>0)   /* 0 => use default */
-            CLAMPCHECK(value, ZSTD_SEARCHLENGTH_MIN, ZSTD_SEARCHLENGTH_MAX);
-        CCtxParams->cParams.searchLength = value;
-        return CCtxParams->cParams.searchLength;
-
-    case ZSTD_p_targetLength :
-        /* all values are valid. 0 => use default */
+    case ZSTD_c_minMatch :
+        if (value!=0)   /* 0 => use default */
+            BOUNDCHECK(ZSTD_c_minMatch, value);
+        CCtxParams->cParams.minMatch = value;
+        return CCtxParams->cParams.minMatch;
+
+    case ZSTD_c_targetLength :
+        BOUNDCHECK(ZSTD_c_targetLength, value);
         CCtxParams->cParams.targetLength = value;
         return CCtxParams->cParams.targetLength;
 
-    case ZSTD_p_compressionStrategy :
-        if (value>0)   /* 0 => use default */
-            CLAMPCHECK(value, (unsigned)ZSTD_fast, (unsigned)ZSTD_btultra);
+    case ZSTD_c_strategy :
+        if (value!=0)   /* 0 => use default */
+            BOUNDCHECK(ZSTD_c_strategy, value);
         CCtxParams->cParams.strategy = (ZSTD_strategy)value;
         return (size_t)CCtxParams->cParams.strategy;
 
-    case ZSTD_p_contentSizeFlag :
+    case ZSTD_c_contentSizeFlag :
         /* Content size written in frame header _when known_ (default:1) */
-        DEBUGLOG(4, "set content size flag = %u", (value>0));
-        CCtxParams->fParams.contentSizeFlag = value > 0;
+        DEBUGLOG(4, "set content size flag = %u", (value!=0));
+        CCtxParams->fParams.contentSizeFlag = value != 0;
         return CCtxParams->fParams.contentSizeFlag;
 
-    case ZSTD_p_checksumFlag :
+    case ZSTD_c_checksumFlag :
         /* A 32-bits content checksum will be calculated and written at end of frame (default:0) */
-        CCtxParams->fParams.checksumFlag = value > 0;
+        CCtxParams->fParams.checksumFlag = value != 0;
         return CCtxParams->fParams.checksumFlag;
 
-    case ZSTD_p_dictIDFlag : /* When applicable, dictionary's dictID is provided in frame header (default:1) */
-        DEBUGLOG(4, "set dictIDFlag = %u", (value>0));
+    case ZSTD_c_dictIDFlag : /* When applicable, dictionary's dictID is provided in frame header (default:1) */
+        DEBUGLOG(4, "set dictIDFlag = %u", (value!=0));
         CCtxParams->fParams.noDictIDFlag = !value;
         return !CCtxParams->fParams.noDictIDFlag;
 
-    case ZSTD_p_forceMaxWindow :
-        CCtxParams->forceWindow = (value > 0);
+    case ZSTD_c_forceMaxWindow :
+        CCtxParams->forceWindow = (value != 0);
         return CCtxParams->forceWindow;
 
-    case ZSTD_p_forceAttachDict :
-        CCtxParams->attachDictPref = value ?
-                                    (value > 0 ? ZSTD_dictForceAttach : ZSTD_dictForceCopy) :
-                                     ZSTD_dictDefaultAttach;
+    case ZSTD_c_forceAttachDict : {
+        const ZSTD_dictAttachPref_e pref = (ZSTD_dictAttachPref_e)value;
+        BOUNDCHECK(ZSTD_c_forceAttachDict, pref);
+        CCtxParams->attachDictPref = pref;
         return CCtxParams->attachDictPref;
-
-    case ZSTD_p_nbWorkers :
+    }
+
+    case ZSTD_c_nbWorkers :
 #ifndef ZSTD_MULTITHREAD
-        if (value>0) return ERROR(parameter_unsupported);
+        if (value!=0) return ERROR(parameter_unsupported);
         return 0;
 #else
         return ZSTDMT_CCtxParam_setNbWorkers(CCtxParams, value);
 #endif
 
-    case ZSTD_p_jobSize :
+    case ZSTD_c_jobSize :
 #ifndef ZSTD_MULTITHREAD
         return ERROR(parameter_unsupported);
 #else
         return ZSTDMT_CCtxParam_setMTCtxParameter(CCtxParams, ZSTDMT_p_jobSize, value);
 #endif
 
-    case ZSTD_p_overlapSizeLog :
+    case ZSTD_c_overlapLog :
+#ifndef ZSTD_MULTITHREAD
+        return ERROR(parameter_unsupported);
+#else
+        return ZSTDMT_CCtxParam_setMTCtxParameter(CCtxParams, ZSTDMT_p_overlapLog, value);
+#endif
+
+    case ZSTD_c_rsyncable :
 #ifndef ZSTD_MULTITHREAD
         return ERROR(parameter_unsupported);
 #else
-        return ZSTDMT_CCtxParam_setMTCtxParameter(CCtxParams, ZSTDMT_p_overlapSectionLog, value);
+        return ZSTDMT_CCtxParam_setMTCtxParameter(CCtxParams, ZSTDMT_p_rsyncable, value);
 #endif
 
-    case ZSTD_p_enableLongDistanceMatching :
-        CCtxParams->ldmParams.enableLdm = (value>0);
+    case ZSTD_c_enableLongDistanceMatching :
+        CCtxParams->ldmParams.enableLdm = (value!=0);
         return CCtxParams->ldmParams.enableLdm;
 
-    case ZSTD_p_ldmHashLog :
-        if (value>0)   /* 0 ==> auto */
-            CLAMPCHECK(value, ZSTD_HASHLOG_MIN, ZSTD_HASHLOG_MAX);
+    case ZSTD_c_ldmHashLog :
+        if (value!=0)   /* 0 ==> auto */
+            BOUNDCHECK(ZSTD_c_ldmHashLog, value);
         CCtxParams->ldmParams.hashLog = value;
         return CCtxParams->ldmParams.hashLog;
 
-    case ZSTD_p_ldmMinMatch :
-        if (value>0)   /* 0 ==> default */
-            CLAMPCHECK(value, ZSTD_LDM_MINMATCH_MIN, ZSTD_LDM_MINMATCH_MAX);
+    case ZSTD_c_ldmMinMatch :
+        if (value!=0)   /* 0 ==> default */
+            BOUNDCHECK(ZSTD_c_ldmMinMatch, value);
         CCtxParams->ldmParams.minMatchLength = value;
         return CCtxParams->ldmParams.minMatchLength;
 
-    case ZSTD_p_ldmBucketSizeLog :
-        if (value > ZSTD_LDM_BUCKETSIZELOG_MAX)
-            return ERROR(parameter_outOfBound);
+    case ZSTD_c_ldmBucketSizeLog :
+        if (value!=0)   /* 0 ==> default */
+            BOUNDCHECK(ZSTD_c_ldmBucketSizeLog, value);
         CCtxParams->ldmParams.bucketSizeLog = value;
         return CCtxParams->ldmParams.bucketSizeLog;
 
-    case ZSTD_p_ldmHashEveryLog :
+    case ZSTD_c_ldmHashRateLog :
         if (value > ZSTD_WINDOWLOG_MAX - ZSTD_HASHLOG_MIN)
             return ERROR(parameter_outOfBound);
-        CCtxParams->ldmParams.hashEveryLog = value;
-        return CCtxParams->ldmParams.hashEveryLog;
+        CCtxParams->ldmParams.hashRateLog = value;
+        return CCtxParams->ldmParams.hashRateLog;
 
     default: return ERROR(parameter_unsupported);
     }
 }
 
-size_t ZSTD_CCtx_getParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, unsigned* value)
+size_t ZSTD_CCtx_getParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, int* value)
 {
     return ZSTD_CCtxParam_getParameter(&cctx->requestedParams, param, value);
 }
 
 size_t ZSTD_CCtxParam_getParameter(
-        ZSTD_CCtx_params* CCtxParams, ZSTD_cParameter param, unsigned* value)
+        ZSTD_CCtx_params* CCtxParams, ZSTD_cParameter param, int* value)
 {
     switch(param)
     {
-    case ZSTD_p_format :
+    case ZSTD_c_format :
         *value = CCtxParams->format;
         break;
-    case ZSTD_p_compressionLevel :
+    case ZSTD_c_compressionLevel :
         *value = CCtxParams->compressionLevel;
         break;
-    case ZSTD_p_windowLog :
+    case ZSTD_c_windowLog :
         *value = CCtxParams->cParams.windowLog;
         break;
-    case ZSTD_p_hashLog :
+    case ZSTD_c_hashLog :
         *value = CCtxParams->cParams.hashLog;
         break;
-    case ZSTD_p_chainLog :
+    case ZSTD_c_chainLog :
         *value = CCtxParams->cParams.chainLog;
         break;
-    case ZSTD_p_searchLog :
+    case ZSTD_c_searchLog :
         *value = CCtxParams->cParams.searchLog;
         break;
-    case ZSTD_p_minMatch :
-        *value = CCtxParams->cParams.searchLength;
+    case ZSTD_c_minMatch :
+        *value = CCtxParams->cParams.minMatch;
         break;
-    case ZSTD_p_targetLength :
+    case ZSTD_c_targetLength :
         *value = CCtxParams->cParams.targetLength;
         break;
-    case ZSTD_p_compressionStrategy :
+    case ZSTD_c_strategy :
         *value = (unsigned)CCtxParams->cParams.strategy;
         break;
-    case ZSTD_p_contentSizeFlag :
+    case ZSTD_c_contentSizeFlag :
         *value = CCtxParams->fParams.contentSizeFlag;
         break;
-    case ZSTD_p_checksumFlag :
+    case ZSTD_c_checksumFlag :
         *value = CCtxParams->fParams.checksumFlag;
         break;
-    case ZSTD_p_dictIDFlag :
+    case ZSTD_c_dictIDFlag :
         *value = !CCtxParams->fParams.noDictIDFlag;
         break;
-    case ZSTD_p_forceMaxWindow :
+    case ZSTD_c_forceMaxWindow :
         *value = CCtxParams->forceWindow;
         break;
-    case ZSTD_p_forceAttachDict :
+    case ZSTD_c_forceAttachDict :
         *value = CCtxParams->attachDictPref;
         break;
-    case ZSTD_p_nbWorkers :
+    case ZSTD_c_nbWorkers :
 #ifndef ZSTD_MULTITHREAD
         assert(CCtxParams->nbWorkers == 0);
 #endif
         *value = CCtxParams->nbWorkers;
         break;
-    case ZSTD_p_jobSize :
+    case ZSTD_c_jobSize :
 #ifndef ZSTD_MULTITHREAD
         return ERROR(parameter_unsupported);
 #else
-        *value = CCtxParams->jobSize;
+        assert(CCtxParams->jobSize <= INT_MAX);
+        *value = (int)CCtxParams->jobSize;
         break;
 #endif
-    case ZSTD_p_overlapSizeLog :
+    case ZSTD_c_overlapLog :
 #ifndef ZSTD_MULTITHREAD
         return ERROR(parameter_unsupported);
 #else
-        *value = CCtxParams->overlapSizeLog;
+        *value = CCtxParams->overlapLog;
         break;
 #endif
-    case ZSTD_p_enableLongDistanceMatching :
+    case ZSTD_c_rsyncable :
+#ifndef ZSTD_MULTITHREAD
+        return ERROR(parameter_unsupported);
+#else
+        *value = CCtxParams->rsyncable;
+        break;
+#endif
+    case ZSTD_c_enableLongDistanceMatching :
         *value = CCtxParams->ldmParams.enableLdm;
         break;
-    case ZSTD_p_ldmHashLog :
+    case ZSTD_c_ldmHashLog :
         *value = CCtxParams->ldmParams.hashLog;
         break;
-    case ZSTD_p_ldmMinMatch :
+    case ZSTD_c_ldmMinMatch :
         *value = CCtxParams->ldmParams.minMatchLength;
         break;
-    case ZSTD_p_ldmBucketSizeLog :
+    case ZSTD_c_ldmBucketSizeLog :
         *value = CCtxParams->ldmParams.bucketSizeLog;
         break;
-    case ZSTD_p_ldmHashEveryLog :
-        *value = CCtxParams->ldmParams.hashEveryLog;
+    case ZSTD_c_ldmHashRateLog :
+        *value = CCtxParams->ldmParams.hashRateLog;
         break;
     default: return ERROR(parameter_unsupported);
     }
@@ -655,34 +825,35 @@
 
 /*! ZSTD_CCtx_reset() :
  *  Also dumps dictionary */
-void ZSTD_CCtx_reset(ZSTD_CCtx* cctx)
+size_t ZSTD_CCtx_reset(ZSTD_CCtx* cctx, ZSTD_ResetDirective reset)
 {
-    cctx->streamStage = zcss_init;
-    cctx->pledgedSrcSizePlusOne = 0;
+    if ( (reset == ZSTD_reset_session_only)
+      || (reset == ZSTD_reset_session_and_parameters) ) {
+        cctx->streamStage = zcss_init;
+        cctx->pledgedSrcSizePlusOne = 0;
+    }
+    if ( (reset == ZSTD_reset_parameters)
+      || (reset == ZSTD_reset_session_and_parameters) ) {
+        if (cctx->streamStage != zcss_init) return ERROR(stage_wrong);
+        cctx->cdict = NULL;
+        return ZSTD_CCtxParams_reset(&cctx->requestedParams);
+    }
+    return 0;
 }
 
-size_t ZSTD_CCtx_resetParameters(ZSTD_CCtx* cctx)
-{
-    if (cctx->streamStage != zcss_init) return ERROR(stage_wrong);
-    cctx->cdict = NULL;
-    return ZSTD_CCtxParams_reset(&cctx->requestedParams);
-}
 
 /** ZSTD_checkCParams() :
     control CParam values remain within authorized range.
     @return : 0, or an error code if one value is beyond authorized range */
 size_t ZSTD_checkCParams(ZSTD_compressionParameters cParams)
 {
-    CLAMPCHECK(cParams.windowLog, ZSTD_WINDOWLOG_MIN, ZSTD_WINDOWLOG_MAX);
-    CLAMPCHECK(cParams.chainLog, ZSTD_CHAINLOG_MIN, ZSTD_CHAINLOG_MAX);
-    CLAMPCHECK(cParams.hashLog, ZSTD_HASHLOG_MIN, ZSTD_HASHLOG_MAX);
-    CLAMPCHECK(cParams.searchLog, ZSTD_SEARCHLOG_MIN, ZSTD_SEARCHLOG_MAX);
-    CLAMPCHECK(cParams.searchLength, ZSTD_SEARCHLENGTH_MIN, ZSTD_SEARCHLENGTH_MAX);
-    ZSTD_STATIC_ASSERT(ZSTD_TARGETLENGTH_MIN == 0);
-    if (cParams.targetLength > ZSTD_TARGETLENGTH_MAX)
-        return ERROR(parameter_outOfBound);
-    if ((U32)(cParams.strategy) > (U32)ZSTD_btultra)
-        return ERROR(parameter_unsupported);
+    BOUNDCHECK(ZSTD_c_windowLog, cParams.windowLog);
+    BOUNDCHECK(ZSTD_c_chainLog,  cParams.chainLog);
+    BOUNDCHECK(ZSTD_c_hashLog,   cParams.hashLog);
+    BOUNDCHECK(ZSTD_c_searchLog, cParams.searchLog);
+    BOUNDCHECK(ZSTD_c_minMatch,  cParams.minMatch);
+    BOUNDCHECK(ZSTD_c_targetLength,cParams.targetLength);
+    BOUNDCHECK(ZSTD_c_strategy,  cParams.strategy);
     return 0;
 }
 
@@ -692,19 +863,19 @@
 static ZSTD_compressionParameters
 ZSTD_clampCParams(ZSTD_compressionParameters cParams)
 {
-#   define CLAMP(val,min,max) {      \
-        if (val<min) val=min;        \
-        else if (val>max) val=max;   \
+#   define CLAMP_TYPE(cParam, val, type) {                                \
+        ZSTD_bounds const bounds = ZSTD_cParam_getBounds(cParam);         \
+        if ((int)val<bounds.lowerBound) val=(type)bounds.lowerBound;      \
+        else if ((int)val>bounds.upperBound) val=(type)bounds.upperBound; \
     }
-    CLAMP(cParams.windowLog, ZSTD_WINDOWLOG_MIN, ZSTD_WINDOWLOG_MAX);
-    CLAMP(cParams.chainLog, ZSTD_CHAINLOG_MIN, ZSTD_CHAINLOG_MAX);
-    CLAMP(cParams.hashLog, ZSTD_HASHLOG_MIN, ZSTD_HASHLOG_MAX);
-    CLAMP(cParams.searchLog, ZSTD_SEARCHLOG_MIN, ZSTD_SEARCHLOG_MAX);
-    CLAMP(cParams.searchLength, ZSTD_SEARCHLENGTH_MIN, ZSTD_SEARCHLENGTH_MAX);
-    ZSTD_STATIC_ASSERT(ZSTD_TARGETLENGTH_MIN == 0);
-    if (cParams.targetLength > ZSTD_TARGETLENGTH_MAX)
-        cParams.targetLength = ZSTD_TARGETLENGTH_MAX;
-    CLAMP(cParams.strategy, ZSTD_fast, ZSTD_btultra);
+#   define CLAMP(cParam, val) CLAMP_TYPE(cParam, val, int)
+    CLAMP(ZSTD_c_windowLog, cParams.windowLog);
+    CLAMP(ZSTD_c_chainLog,  cParams.chainLog);
+    CLAMP(ZSTD_c_hashLog,   cParams.hashLog);
+    CLAMP(ZSTD_c_searchLog, cParams.searchLog);
+    CLAMP(ZSTD_c_minMatch,  cParams.minMatch);
+    CLAMP(ZSTD_c_targetLength,cParams.targetLength);
+    CLAMP_TYPE(ZSTD_c_strategy,cParams.strategy, ZSTD_strategy);
     return cParams;
 }
 
@@ -774,7 +945,7 @@
     if (CCtxParams->cParams.hashLog) cParams.hashLog = CCtxParams->cParams.hashLog;
     if (CCtxParams->cParams.chainLog) cParams.chainLog = CCtxParams->cParams.chainLog;
     if (CCtxParams->cParams.searchLog) cParams.searchLog = CCtxParams->cParams.searchLog;
-    if (CCtxParams->cParams.searchLength) cParams.searchLength = CCtxParams->cParams.searchLength;
+    if (CCtxParams->cParams.minMatch) cParams.minMatch = CCtxParams->cParams.minMatch;
     if (CCtxParams->cParams.targetLength) cParams.targetLength = CCtxParams->cParams.targetLength;
     if (CCtxParams->cParams.strategy) cParams.strategy = CCtxParams->cParams.strategy;
     assert(!ZSTD_checkCParams(cParams));
@@ -787,13 +958,12 @@
 {
     size_t const chainSize = (cParams->strategy == ZSTD_fast) ? 0 : ((size_t)1 << cParams->chainLog);
     size_t const hSize = ((size_t)1) << cParams->hashLog;
-    U32    const hashLog3 = (forCCtx && cParams->searchLength==3) ? MIN(ZSTD_HASHLOG3_MAX, cParams->windowLog) : 0;
+    U32    const hashLog3 = (forCCtx && cParams->minMatch==3) ? MIN(ZSTD_HASHLOG3_MAX, cParams->windowLog) : 0;
     size_t const h3Size = ((size_t)1) << hashLog3;
     size_t const tableSpace = (chainSize + hSize + h3Size) * sizeof(U32);
     size_t const optPotentialSpace = ((MaxML+1) + (MaxLL+1) + (MaxOff+1) + (1<<Litbits)) * sizeof(U32)
                           + (ZSTD_OPT_NUM+1) * (sizeof(ZSTD_match_t)+sizeof(ZSTD_optimal_t));
-    size_t const optSpace = (forCCtx && ((cParams->strategy == ZSTD_btopt) ||
-                                         (cParams->strategy == ZSTD_btultra)))
+    size_t const optSpace = (forCCtx && (cParams->strategy >= ZSTD_btopt))
                                 ? optPotentialSpace
                                 : 0;
     DEBUGLOG(4, "chainSize: %u - hSize: %u - h3Size: %u",
@@ -808,7 +978,7 @@
     {   ZSTD_compressionParameters const cParams =
                 ZSTD_getCParamsFromCCtxParams(params, 0, 0);
         size_t const blockSize = MIN(ZSTD_BLOCKSIZE_MAX, (size_t)1 << cParams.windowLog);
-        U32    const divider = (cParams.searchLength==3) ? 3 : 4;
+        U32    const divider = (cParams.minMatch==3) ? 3 : 4;
         size_t const maxNbSeq = blockSize / divider;
         size_t const tokenSpace = WILDCOPY_OVERLENGTH + blockSize + 11*maxNbSeq;
         size_t const entropySpace = HUF_WORKSPACE_SIZE;
@@ -843,7 +1013,7 @@
 {
     int level;
     size_t memBudget = 0;
-    for (level=1; level<=compressionLevel; level++) {
+    for (level=MIN(compressionLevel, 1); level<=compressionLevel; level++) {
         size_t const newMB = ZSTD_estimateCCtxSize_internal(level);
         if (newMB > memBudget) memBudget = newMB;
     }
@@ -879,7 +1049,7 @@
 {
     int level;
     size_t memBudget = 0;
-    for (level=1; level<=compressionLevel; level++) {
+    for (level=MIN(compressionLevel, 1); level<=compressionLevel; level++) {
         size_t const newMB = ZSTD_estimateCStreamSize_internal(level);
         if (newMB > memBudget) memBudget = newMB;
     }
@@ -933,7 +1103,7 @@
     return (cParams1.hashLog  == cParams2.hashLog)
          & (cParams1.chainLog == cParams2.chainLog)
          & (cParams1.strategy == cParams2.strategy)   /* opt parser space */
-         & ((cParams1.searchLength==3) == (cParams2.searchLength==3));  /* hashlog3 space */
+         & ((cParams1.minMatch==3) == (cParams2.minMatch==3));  /* hashlog3 space */
 }
 
 static void ZSTD_assertEqualCParams(ZSTD_compressionParameters cParams1,
@@ -945,7 +1115,7 @@
     assert(cParams1.chainLog     == cParams2.chainLog);
     assert(cParams1.hashLog      == cParams2.hashLog);
     assert(cParams1.searchLog    == cParams2.searchLog);
-    assert(cParams1.searchLength == cParams2.searchLength);
+    assert(cParams1.minMatch     == cParams2.minMatch);
     assert(cParams1.targetLength == cParams2.targetLength);
     assert(cParams1.strategy     == cParams2.strategy);
 }
@@ -960,7 +1130,7 @@
             ldmParams1.hashLog == ldmParams2.hashLog &&
             ldmParams1.bucketSizeLog == ldmParams2.bucketSizeLog &&
             ldmParams1.minMatchLength == ldmParams2.minMatchLength &&
-            ldmParams1.hashEveryLog == ldmParams2.hashEveryLog);
+            ldmParams1.hashRateLog == ldmParams2.hashRateLog);
 }
 
 typedef enum { ZSTDb_not_buffered, ZSTDb_buffered } ZSTD_buffered_policy_e;
@@ -976,7 +1146,7 @@
 {
     size_t const windowSize2 = MAX(1, (size_t)MIN(((U64)1 << cParams2.windowLog), pledgedSrcSize));
     size_t const blockSize2 = MIN(ZSTD_BLOCKSIZE_MAX, windowSize2);
-    size_t const maxNbSeq2 = blockSize2 / ((cParams2.searchLength == 3) ? 3 : 4);
+    size_t const maxNbSeq2 = blockSize2 / ((cParams2.minMatch == 3) ? 3 : 4);
     size_t const maxNbLit2 = blockSize2;
     size_t const neededBufferSize2 = (buffPol2==ZSTDb_buffered) ? windowSize2 + blockSize2 : 0;
     DEBUGLOG(4, "ZSTD_sufficientBuff: is neededBufferSize2=%u <= bufferSize1=%u",
@@ -1034,8 +1204,8 @@
 {
     ZSTD_window_clear(&ms->window);
 
-    ms->nextToUpdate = ms->window.dictLimit + 1;
-    ms->nextToUpdate3 = ms->window.dictLimit + 1;
+    ms->nextToUpdate = ms->window.dictLimit;
+    ms->nextToUpdate3 = ms->window.dictLimit;
     ms->loadedDictEnd = 0;
     ms->opt.litLengthSum = 0;  /* force reset of btopt stats */
     ms->dictMatchState = NULL;
@@ -1080,7 +1250,7 @@
 {
     size_t const chainSize = (cParams->strategy == ZSTD_fast) ? 0 : ((size_t)1 << cParams->chainLog);
     size_t const hSize = ((size_t)1) << cParams->hashLog;
-    U32    const hashLog3 = (forCCtx && cParams->searchLength==3) ? MIN(ZSTD_HASHLOG3_MAX, cParams->windowLog) : 0;
+    U32    const hashLog3 = (forCCtx && cParams->minMatch==3) ? MIN(ZSTD_HASHLOG3_MAX, cParams->windowLog) : 0;
     size_t const h3Size = ((size_t)1) << hashLog3;
     size_t const tableSpace = (chainSize + hSize + h3Size) * sizeof(U32);
 
@@ -1094,9 +1264,9 @@
     ZSTD_invalidateMatchState(ms);
 
     /* opt parser space */
-    if (forCCtx && ((cParams->strategy == ZSTD_btopt) | (cParams->strategy == ZSTD_btultra))) {
+    if (forCCtx && (cParams->strategy >= ZSTD_btopt)) {
         DEBUGLOG(4, "reserving optimal parser space");
-        ms->opt.litFreq = (U32*)ptr;
+        ms->opt.litFreq = (unsigned*)ptr;
         ms->opt.litLengthFreq = ms->opt.litFreq + (1<<Litbits);
         ms->opt.matchLengthFreq = ms->opt.litLengthFreq + (MaxLL+1);
         ms->opt.offCodeFreq = ms->opt.matchLengthFreq + (MaxML+1);
@@ -1158,13 +1328,13 @@
         /* Adjust long distance matching parameters */
         ZSTD_ldm_adjustParameters(&params.ldmParams, &params.cParams);
         assert(params.ldmParams.hashLog >= params.ldmParams.bucketSizeLog);
-        assert(params.ldmParams.hashEveryLog < 32);
-        zc->ldmState.hashPower = ZSTD_ldm_getHashPower(params.ldmParams.minMatchLength);
+        assert(params.ldmParams.hashRateLog < 32);
+        zc->ldmState.hashPower = ZSTD_rollingHash_primePower(params.ldmParams.minMatchLength);
     }
 
     {   size_t const windowSize = MAX(1, (size_t)MIN(((U64)1 << params.cParams.windowLog), pledgedSrcSize));
         size_t const blockSize = MIN(ZSTD_BLOCKSIZE_MAX, windowSize);
-        U32    const divider = (params.cParams.searchLength==3) ? 3 : 4;
+        U32    const divider = (params.cParams.minMatch==3) ? 3 : 4;
         size_t const maxNbSeq = blockSize / divider;
         size_t const tokenSpace = WILDCOPY_OVERLENGTH + blockSize + 11*maxNbSeq;
         size_t const buffOutSize = (zbuff==ZSTDb_buffered) ? ZSTD_compressBound(blockSize)+1 : 0;
@@ -1227,7 +1397,7 @@
         if (pledgedSrcSize == ZSTD_CONTENTSIZE_UNKNOWN)
             zc->appliedParams.fParams.contentSizeFlag = 0;
         DEBUGLOG(4, "pledged content size : %u ; flag : %u",
-            (U32)pledgedSrcSize, zc->appliedParams.fParams.contentSizeFlag);
+            (unsigned)pledgedSrcSize, zc->appliedParams.fParams.contentSizeFlag);
         zc->blockSize = blockSize;
 
         XXH64_reset(&zc->xxhState, 0);
@@ -1306,16 +1476,17 @@
  * dictionary tables into the working context is faster than using them
  * in-place.
  */
-static const size_t attachDictSizeCutoffs[(unsigned)ZSTD_btultra+1] = {
-    8 KB, /* unused */
-    8 KB, /* ZSTD_fast */
+static const size_t attachDictSizeCutoffs[ZSTD_STRATEGY_MAX+1] = {
+    8 KB,  /* unused */
+    8 KB,  /* ZSTD_fast */
     16 KB, /* ZSTD_dfast */
     32 KB, /* ZSTD_greedy */
     32 KB, /* ZSTD_lazy */
     32 KB, /* ZSTD_lazy2 */
     32 KB, /* ZSTD_btlazy2 */
     32 KB, /* ZSTD_btopt */
-    8 KB /* ZSTD_btultra */
+    8 KB,  /* ZSTD_btultra */
+    8 KB   /* ZSTD_btultra2 */
 };
 
 static int ZSTD_shouldAttachDict(const ZSTD_CDict* cdict,
@@ -1447,7 +1618,8 @@
                             ZSTD_buffered_policy_e zbuff)
 {
 
-    DEBUGLOG(4, "ZSTD_resetCCtx_usingCDict (pledgedSrcSize=%u)", (U32)pledgedSrcSize);
+    DEBUGLOG(4, "ZSTD_resetCCtx_usingCDict (pledgedSrcSize=%u)",
+                (unsigned)pledgedSrcSize);
 
     if (ZSTD_shouldAttachDict(cdict, params, pledgedSrcSize)) {
         return ZSTD_resetCCtx_byAttachingCDict(
@@ -1670,7 +1842,9 @@
  * note : use same formula for both situations */
 static size_t ZSTD_minGain(size_t srcSize, ZSTD_strategy strat)
 {
-    U32 const minlog = (strat==ZSTD_btultra) ? 7 : 6;
+    U32 const minlog = (strat>=ZSTD_btultra) ? (U32)(strat) - 1 : 6;
+    ZSTD_STATIC_ASSERT(ZSTD_btultra == 8);
+    assert(ZSTD_cParam_withinBounds(ZSTD_c_strategy, strat));
     return (srcSize >> minlog) + 2;
 }
 
@@ -1679,7 +1853,8 @@
                                      ZSTD_strategy strategy, int disableLiteralCompression,
                                      void* dst, size_t dstCapacity,
                                const void* src, size_t srcSize,
-                                     U32* workspace, const int bmi2)
+                                     void* workspace, size_t wkspSize,
+                               const int bmi2)
 {
     size_t const minGain = ZSTD_minGain(srcSize, strategy);
     size_t const lhSize = 3 + (srcSize >= 1 KB) + (srcSize >= 16 KB);
@@ -1708,9 +1883,9 @@
         int const preferRepeat = strategy < ZSTD_lazy ? srcSize <= 1024 : 0;
         if (repeat == HUF_repeat_valid && lhSize == 3) singleStream = 1;
         cLitSize = singleStream ? HUF_compress1X_repeat(ostart+lhSize, dstCapacity-lhSize, src, srcSize, 255, 11,
-                                      workspace, HUF_WORKSPACE_SIZE, (HUF_CElt*)nextHuf->CTable, &repeat, preferRepeat, bmi2)
+                                      workspace, wkspSize, (HUF_CElt*)nextHuf->CTable, &repeat, preferRepeat, bmi2)
                                 : HUF_compress4X_repeat(ostart+lhSize, dstCapacity-lhSize, src, srcSize, 255, 11,
-                                      workspace, HUF_WORKSPACE_SIZE, (HUF_CElt*)nextHuf->CTable, &repeat, preferRepeat, bmi2);
+                                      workspace, wkspSize, (HUF_CElt*)nextHuf->CTable, &repeat, preferRepeat, bmi2);
         if (repeat != HUF_repeat_none) {
             /* reused the existing table */
             hType = set_repeat;
@@ -1977,7 +2152,7 @@
         assert(!ZSTD_isError(NCountCost));
         assert(compressedCost < ERROR(maxCode));
         DEBUGLOG(5, "Estimated bit costs: basic=%u\trepeat=%u\tcompressed=%u",
-                    (U32)basicCost, (U32)repeatCost, (U32)compressedCost);
+                    (unsigned)basicCost, (unsigned)repeatCost, (unsigned)compressedCost);
         if (basicCost <= repeatCost && basicCost <= compressedCost) {
             DEBUGLOG(5, "Selected set_basic");
             assert(isDefaultAllowed);
@@ -1999,7 +2174,7 @@
 MEM_STATIC size_t
 ZSTD_buildCTable(void* dst, size_t dstCapacity,
                 FSE_CTable* nextCTable, U32 FSELog, symbolEncodingType_e type,
-                U32* count, U32 max,
+                unsigned* count, U32 max,
                 const BYTE* codeTable, size_t nbSeq,
                 const S16* defaultNorm, U32 defaultNormLog, U32 defaultMax,
                 const FSE_CTable* prevCTable, size_t prevCTableSize,
@@ -2007,11 +2182,13 @@
 {
     BYTE* op = (BYTE*)dst;
     const BYTE* const oend = op + dstCapacity;
+    DEBUGLOG(6, "ZSTD_buildCTable (dstCapacity=%u)", (unsigned)dstCapacity);
 
     switch (type) {
     case set_rle:
+        CHECK_F(FSE_buildCTable_rle(nextCTable, (BYTE)max));
+        if (dstCapacity==0) return ERROR(dstSize_tooSmall);
         *op = codeTable[0];
-        CHECK_F(FSE_buildCTable_rle(nextCTable, (BYTE)max));
         return 1;
     case set_repeat:
         memcpy(nextCTable, prevCTable, prevCTableSize);
@@ -2053,6 +2230,9 @@
     FSE_CState_t  stateLitLength;
 
     CHECK_E(BIT_initCStream(&blockStream, dst, dstCapacity), dstSize_tooSmall); /* not enough space remaining */
+    DEBUGLOG(6, "available space for bitstream : %i  (dstCapacity=%u)",
+                (int)(blockStream.endPtr - blockStream.startPtr),
+                (unsigned)dstCapacity);
 
     /* first symbols */
     FSE_initCState2(&stateMatchLength, CTable_MatchLength, mlCodeTable[nbSeq-1]);
@@ -2085,9 +2265,9 @@
             U32  const ofBits = ofCode;
             U32  const mlBits = ML_bits[mlCode];
             DEBUGLOG(6, "encoding: litlen:%2u - matchlen:%2u - offCode:%7u",
-                        sequences[n].litLength,
-                        sequences[n].matchLength + MINMATCH,
-                        sequences[n].offset);
+                        (unsigned)sequences[n].litLength,
+                        (unsigned)sequences[n].matchLength + MINMATCH,
+                        (unsigned)sequences[n].offset);
                                                                             /* 32b*/  /* 64b*/
                                                                             /* (7)*/  /* (7)*/
             FSE_encodeSymbol(&blockStream, &stateOffsetBits, ofCode);       /* 15 */  /* 15 */
@@ -2112,6 +2292,7 @@
                 BIT_addBits(&blockStream, sequences[n].offset, ofBits);     /* 31 */
             }
             BIT_flushBits(&blockStream);                                    /* (7)*/
+            DEBUGLOG(7, "remaining space : %i", (int)(blockStream.endPtr - blockStream.ptr));
     }   }
 
     DEBUGLOG(6, "ZSTD_encodeSequences: flushing ML state with %u bits", stateMatchLength.stateLog);
@@ -2169,6 +2350,7 @@
             FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
             seqDef const* sequences, size_t nbSeq, int longOffsets, int bmi2)
 {
+    DEBUGLOG(5, "ZSTD_encodeSequences: dstCapacity = %u", (unsigned)dstCapacity);
 #if DYNAMIC_BMI2
     if (bmi2) {
         return ZSTD_encodeSequences_bmi2(dst, dstCapacity,
@@ -2186,16 +2368,20 @@
                                         sequences, nbSeq, longOffsets);
 }
 
-MEM_STATIC size_t ZSTD_compressSequences_internal(seqStore_t* seqStorePtr,
-                              ZSTD_entropyCTables_t const* prevEntropy,
-                              ZSTD_entropyCTables_t* nextEntropy,
-                              ZSTD_CCtx_params const* cctxParams,
-                              void* dst, size_t dstCapacity, U32* workspace,
-                              const int bmi2)
+/* ZSTD_compressSequences_internal():
+ * actually compresses both literals and sequences */
+MEM_STATIC size_t
+ZSTD_compressSequences_internal(seqStore_t* seqStorePtr,
+                          const ZSTD_entropyCTables_t* prevEntropy,
+                                ZSTD_entropyCTables_t* nextEntropy,
+                          const ZSTD_CCtx_params* cctxParams,
+                                void* dst, size_t dstCapacity,
+                                void* workspace, size_t wkspSize,
+                          const int bmi2)
 {
     const int longOffsets = cctxParams->cParams.windowLog > STREAM_ACCUMULATOR_MIN;
     ZSTD_strategy const strategy = cctxParams->cParams.strategy;
-    U32 count[MaxSeq+1];
+    unsigned count[MaxSeq+1];
     FSE_CTable* CTable_LitLength = nextEntropy->fse.litlengthCTable;
     FSE_CTable* CTable_OffsetBits = nextEntropy->fse.offcodeCTable;
     FSE_CTable* CTable_MatchLength = nextEntropy->fse.matchlengthCTable;
@@ -2212,6 +2398,7 @@
     BYTE* lastNCount = NULL;
 
     ZSTD_STATIC_ASSERT(HUF_WORKSPACE_SIZE >= (1<<MAX(MLFSELog,LLFSELog)));
+    DEBUGLOG(5, "ZSTD_compressSequences_internal");
 
     /* Compress literals */
     {   const BYTE* const literals = seqStorePtr->litStart;
@@ -2222,7 +2409,8 @@
                                     cctxParams->cParams.strategy, disableLiteralCompression,
                                     op, dstCapacity,
                                     literals, litSize,
-                                    workspace, bmi2);
+                                    workspace, wkspSize,
+                                    bmi2);
         if (ZSTD_isError(cSize))
           return cSize;
         assert(cSize <= dstCapacity);
@@ -2249,51 +2437,63 @@
     /* convert length/distances into codes */
     ZSTD_seqToCodes(seqStorePtr);
     /* build CTable for Literal Lengths */
-    {   U32 max = MaxLL;
-        size_t const mostFrequent = HIST_countFast_wksp(count, &max, llCodeTable, nbSeq, workspace);   /* can't fail */
+    {   unsigned max = MaxLL;
+        size_t const mostFrequent = HIST_countFast_wksp(count, &max, llCodeTable, nbSeq, workspace, wkspSize);   /* can't fail */
         DEBUGLOG(5, "Building LL table");
         nextEntropy->fse.litlength_repeatMode = prevEntropy->fse.litlength_repeatMode;
-        LLtype = ZSTD_selectEncodingType(&nextEntropy->fse.litlength_repeatMode, count, max, mostFrequent, nbSeq, LLFSELog, prevEntropy->fse.litlengthCTable, LL_defaultNorm, LL_defaultNormLog, ZSTD_defaultAllowed, strategy);
+        LLtype = ZSTD_selectEncodingType(&nextEntropy->fse.litlength_repeatMode,
+                                        count, max, mostFrequent, nbSeq,
+                                        LLFSELog, prevEntropy->fse.litlengthCTable,
+                                        LL_defaultNorm, LL_defaultNormLog,
+                                        ZSTD_defaultAllowed, strategy);
         assert(set_basic < set_compressed && set_rle < set_compressed);
         assert(!(LLtype < set_compressed && nextEntropy->fse.litlength_repeatMode != FSE_repeat_none)); /* We don't copy tables */
         {   size_t const countSize = ZSTD_buildCTable(op, oend - op, CTable_LitLength, LLFSELog, (symbolEncodingType_e)LLtype,
                                                     count, max, llCodeTable, nbSeq, LL_defaultNorm, LL_defaultNormLog, MaxLL,
                                                     prevEntropy->fse.litlengthCTable, sizeof(prevEntropy->fse.litlengthCTable),
-                                                    workspace, HUF_WORKSPACE_SIZE);
+                                                    workspace, wkspSize);
             if (ZSTD_isError(countSize)) return countSize;
             if (LLtype == set_compressed)
                 lastNCount = op;
             op += countSize;
     }   }
     /* build CTable for Offsets */
-    {   U32 max = MaxOff;
-        size_t const mostFrequent = HIST_countFast_wksp(count, &max, ofCodeTable, nbSeq, workspace);  /* can't fail */
+    {   unsigned max = MaxOff;
+        size_t const mostFrequent = HIST_countFast_wksp(count, &max, ofCodeTable, nbSeq, workspace, wkspSize);  /* can't fail */
         /* We can only use the basic table if max <= DefaultMaxOff, otherwise the offsets are too large */
         ZSTD_defaultPolicy_e const defaultPolicy = (max <= DefaultMaxOff) ? ZSTD_defaultAllowed : ZSTD_defaultDisallowed;
         DEBUGLOG(5, "Building OF table");
         nextEntropy->fse.offcode_repeatMode = prevEntropy->fse.offcode_repeatMode;
-        Offtype = ZSTD_selectEncodingType(&nextEntropy->fse.offcode_repeatMode, count, max, mostFrequent, nbSeq, OffFSELog, prevEntropy->fse.offcodeCTable, OF_defaultNorm, OF_defaultNormLog, defaultPolicy, strategy);
+        Offtype = ZSTD_selectEncodingType(&nextEntropy->fse.offcode_repeatMode,
+                                        count, max, mostFrequent, nbSeq,
+                                        OffFSELog, prevEntropy->fse.offcodeCTable,
+                                        OF_defaultNorm, OF_defaultNormLog,
+                                        defaultPolicy, strategy);
         assert(!(Offtype < set_compressed && nextEntropy->fse.offcode_repeatMode != FSE_repeat_none)); /* We don't copy tables */
         {   size_t const countSize = ZSTD_buildCTable(op, oend - op, CTable_OffsetBits, OffFSELog, (symbolEncodingType_e)Offtype,
                                                     count, max, ofCodeTable, nbSeq, OF_defaultNorm, OF_defaultNormLog, DefaultMaxOff,
                                                     prevEntropy->fse.offcodeCTable, sizeof(prevEntropy->fse.offcodeCTable),
-                                                    workspace, HUF_WORKSPACE_SIZE);
+                                                    workspace, wkspSize);
             if (ZSTD_isError(countSize)) return countSize;
             if (Offtype == set_compressed)
                 lastNCount = op;
             op += countSize;
     }   }
     /* build CTable for MatchLengths */
-    {   U32 max = MaxML;
-        size_t const mostFrequent = HIST_countFast_wksp(count, &max, mlCodeTable, nbSeq, workspace);   /* can't fail */
-        DEBUGLOG(5, "Building ML table");
+    {   unsigned max = MaxML;
+        size_t const mostFrequent = HIST_countFast_wksp(count, &max, mlCodeTable, nbSeq, workspace, wkspSize);   /* can't fail */
+        DEBUGLOG(5, "Building ML table (remaining space : %i)", (int)(oend-op));
         nextEntropy->fse.matchlength_repeatMode = prevEntropy->fse.matchlength_repeatMode;
-        MLtype = ZSTD_selectEncodingType(&nextEntropy->fse.matchlength_repeatMode, count, max, mostFrequent, nbSeq, MLFSELog, prevEntropy->fse.matchlengthCTable, ML_defaultNorm, ML_defaultNormLog, ZSTD_defaultAllowed, strategy);
+        MLtype = ZSTD_selectEncodingType(&nextEntropy->fse.matchlength_repeatMode,
+                                        count, max, mostFrequent, nbSeq,
+                                        MLFSELog, prevEntropy->fse.matchlengthCTable,
+                                        ML_defaultNorm, ML_defaultNormLog,
+                                        ZSTD_defaultAllowed, strategy);
         assert(!(MLtype < set_compressed && nextEntropy->fse.matchlength_repeatMode != FSE_repeat_none)); /* We don't copy tables */
         {   size_t const countSize = ZSTD_buildCTable(op, oend - op, CTable_MatchLength, MLFSELog, (symbolEncodingType_e)MLtype,
                                                     count, max, mlCodeTable, nbSeq, ML_defaultNorm, ML_defaultNormLog, MaxML,
                                                     prevEntropy->fse.matchlengthCTable, sizeof(prevEntropy->fse.matchlengthCTable),
-                                                    workspace, HUF_WORKSPACE_SIZE);
+                                                    workspace, wkspSize);
             if (ZSTD_isError(countSize)) return countSize;
             if (MLtype == set_compressed)
                 lastNCount = op;
@@ -2328,19 +2528,24 @@
         }
     }
 
+    DEBUGLOG(5, "compressed block size : %u", (unsigned)(op - ostart));
     return op - ostart;
 }
 
-MEM_STATIC size_t ZSTD_compressSequences(seqStore_t* seqStorePtr,
-                        const ZSTD_entropyCTables_t* prevEntropy,
-                              ZSTD_entropyCTables_t* nextEntropy,
-                        const ZSTD_CCtx_params* cctxParams,
-                              void* dst, size_t dstCapacity,
-                              size_t srcSize, U32* workspace, int bmi2)
+MEM_STATIC size_t
+ZSTD_compressSequences(seqStore_t* seqStorePtr,
+                       const ZSTD_entropyCTables_t* prevEntropy,
+                             ZSTD_entropyCTables_t* nextEntropy,
+                       const ZSTD_CCtx_params* cctxParams,
+                             void* dst, size_t dstCapacity,
+                             size_t srcSize,
+                             void* workspace, size_t wkspSize,
+                             int bmi2)
 {
     size_t const cSize = ZSTD_compressSequences_internal(
-            seqStorePtr, prevEntropy, nextEntropy, cctxParams, dst, dstCapacity,
-            workspace, bmi2);
+                            seqStorePtr, prevEntropy, nextEntropy, cctxParams,
+                            dst, dstCapacity,
+                            workspace, wkspSize, bmi2);
     if (cSize == 0) return 0;
     /* When srcSize <= dstCapacity, there is enough space to write a raw uncompressed block.
      * Since we ran out of space, block must be not compressible, so fall back to raw uncompressed block.
@@ -2362,7 +2567,7 @@
  * assumption : strat is a valid strategy */
 ZSTD_blockCompressor ZSTD_selectBlockCompressor(ZSTD_strategy strat, ZSTD_dictMode_e dictMode)
 {
-    static const ZSTD_blockCompressor blockCompressor[3][(unsigned)ZSTD_btultra+1] = {
+    static const ZSTD_blockCompressor blockCompressor[3][ZSTD_STRATEGY_MAX+1] = {
         { ZSTD_compressBlock_fast  /* default for 0 */,
           ZSTD_compressBlock_fast,
           ZSTD_compressBlock_doubleFast,
@@ -2371,7 +2576,8 @@
           ZSTD_compressBlock_lazy2,
           ZSTD_compressBlock_btlazy2,
           ZSTD_compressBlock_btopt,
-          ZSTD_compressBlock_btultra },
+          ZSTD_compressBlock_btultra,
+          ZSTD_compressBlock_btultra2 },
         { ZSTD_compressBlock_fast_extDict  /* default for 0 */,
           ZSTD_compressBlock_fast_extDict,
           ZSTD_compressBlock_doubleFast_extDict,
@@ -2380,6 +2586,7 @@
           ZSTD_compressBlock_lazy2_extDict,
           ZSTD_compressBlock_btlazy2_extDict,
           ZSTD_compressBlock_btopt_extDict,
+          ZSTD_compressBlock_btultra_extDict,
           ZSTD_compressBlock_btultra_extDict },
         { ZSTD_compressBlock_fast_dictMatchState  /* default for 0 */,
           ZSTD_compressBlock_fast_dictMatchState,
@@ -2389,14 +2596,14 @@
           ZSTD_compressBlock_lazy2_dictMatchState,
           ZSTD_compressBlock_btlazy2_dictMatchState,
           ZSTD_compressBlock_btopt_dictMatchState,
+          ZSTD_compressBlock_btultra_dictMatchState,
           ZSTD_compressBlock_btultra_dictMatchState }
     };
     ZSTD_blockCompressor selectedCompressor;
     ZSTD_STATIC_ASSERT((unsigned)ZSTD_fast == 1);
 
-    assert((U32)strat >= (U32)ZSTD_fast);
-    assert((U32)strat <= (U32)ZSTD_btultra);
-    selectedCompressor = blockCompressor[(int)dictMode][(U32)strat];
+    assert(ZSTD_cParam_withinBounds(ZSTD_c_strategy, strat));
+    selectedCompressor = blockCompressor[(int)dictMode][(int)strat];
     assert(selectedCompressor != NULL);
     return selectedCompressor;
 }
@@ -2421,15 +2628,15 @@
 {
     ZSTD_matchState_t* const ms = &zc->blockState.matchState;
     size_t cSize;
-    DEBUGLOG(5, "ZSTD_compressBlock_internal (dstCapacity=%zu, dictLimit=%u, nextToUpdate=%u)",
-                dstCapacity, ms->window.dictLimit, ms->nextToUpdate);
+    DEBUGLOG(5, "ZSTD_compressBlock_internal (dstCapacity=%u, dictLimit=%u, nextToUpdate=%u)",
+                (unsigned)dstCapacity, (unsigned)ms->window.dictLimit, (unsigned)ms->nextToUpdate);
     assert(srcSize <= ZSTD_BLOCKSIZE_MAX);
 
     /* Assert that we have correctly flushed the ctx params into the ms's copy */
     ZSTD_assertEqualCParams(zc->appliedParams.cParams, ms->cParams);
 
     if (srcSize < MIN_CBLOCK_SIZE+ZSTD_blockHeaderSize+1) {
-        ZSTD_ldm_skipSequences(&zc->externSeqStore, srcSize, zc->appliedParams.cParams.searchLength);
+        ZSTD_ldm_skipSequences(&zc->externSeqStore, srcSize, zc->appliedParams.cParams.minMatch);
         cSize = 0;
         goto out;  /* don't even attempt compression below a certain srcSize */
     }
@@ -2437,8 +2644,8 @@
     ms->opt.symbolCosts = &zc->blockState.prevCBlock->entropy;   /* required for optimal parser to read stats from dictionary */
 
     /* a gap between an attached dict and the current window is not safe,
-     * they must remain adjacent, and when that stops being the case, the dict
-     * must be unset */
+     * they must remain adjacent,
+     * and when that stops being the case, the dict must be unset */
     assert(ms->dictMatchState == NULL || ms->loadedDictEnd == ms->window.dictLimit);
 
     /* limited update after a very long match */
@@ -2495,7 +2702,9 @@
             &zc->blockState.prevCBlock->entropy, &zc->blockState.nextCBlock->entropy,
             &zc->appliedParams,
             dst, dstCapacity,
-            srcSize, zc->entropyWorkspace, zc->bmi2);
+            srcSize,
+            zc->entropyWorkspace, HUF_WORKSPACE_SIZE /* statically allocated in resetCCtx */,
+            zc->bmi2);
 
 out:
     if (!ZSTD_isError(cSize) && cSize != 0) {
@@ -2535,7 +2744,7 @@
     U32 const maxDist = (U32)1 << cctx->appliedParams.cParams.windowLog;
     assert(cctx->appliedParams.cParams.windowLog <= 31);
 
-    DEBUGLOG(5, "ZSTD_compress_frameChunk (blockSize=%u)", (U32)blockSize);
+    DEBUGLOG(5, "ZSTD_compress_frameChunk (blockSize=%u)", (unsigned)blockSize);
     if (cctx->appliedParams.fParams.checksumFlag && srcSize)
         XXH64_update(&cctx->xxhState, src, srcSize);
 
@@ -2583,7 +2792,7 @@
             assert(dstCapacity >= cSize);
             dstCapacity -= cSize;
             DEBUGLOG(5, "ZSTD_compress_frameChunk: adding a block of size %u",
-                        (U32)cSize);
+                        (unsigned)cSize);
     }   }
 
     if (lastFrameChunk && (op>ostart)) cctx->stage = ZSTDcs_ending;
@@ -2606,9 +2815,9 @@
     size_t pos=0;
 
     assert(!(params.fParams.contentSizeFlag && pledgedSrcSize == ZSTD_CONTENTSIZE_UNKNOWN));
-    if (dstCapacity < ZSTD_frameHeaderSize_max) return ERROR(dstSize_tooSmall);
+    if (dstCapacity < ZSTD_FRAMEHEADERSIZE_MAX) return ERROR(dstSize_tooSmall);
     DEBUGLOG(4, "ZSTD_writeFrameHeader : dictIDFlag : %u ; dictID : %u ; dictIDSizeCode : %u",
-                !params.fParams.noDictIDFlag, dictID,  dictIDSizeCode);
+                !params.fParams.noDictIDFlag, (unsigned)dictID, (unsigned)dictIDSizeCode);
 
     if (params.format == ZSTD_f_zstd1) {
         MEM_writeLE32(dst, ZSTD_MAGICNUMBER);
@@ -2672,7 +2881,7 @@
     size_t fhSize = 0;
 
     DEBUGLOG(5, "ZSTD_compressContinue_internal, stage: %u, srcSize: %u",
-                cctx->stage, (U32)srcSize);
+                cctx->stage, (unsigned)srcSize);
     if (cctx->stage==ZSTDcs_created) return ERROR(stage_wrong);   /* missing init (ZSTD_compressBegin) */
 
     if (frame && (cctx->stage==ZSTDcs_init)) {
@@ -2709,7 +2918,7 @@
         }
     }
 
-    DEBUGLOG(5, "ZSTD_compressContinue_internal (blockSize=%u)", (U32)cctx->blockSize);
+    DEBUGLOG(5, "ZSTD_compressContinue_internal (blockSize=%u)", (unsigned)cctx->blockSize);
     {   size_t const cSize = frame ?
                              ZSTD_compress_frameChunk (cctx, dst, dstCapacity, src, srcSize, lastFrameChunk) :
                              ZSTD_compressBlock_internal (cctx, dst, dstCapacity, src, srcSize);
@@ -2721,7 +2930,7 @@
             ZSTD_STATIC_ASSERT(ZSTD_CONTENTSIZE_UNKNOWN == (unsigned long long)-1);
             if (cctx->consumedSrcSize+1 > cctx->pledgedSrcSizePlusOne) {
                 DEBUGLOG(4, "error : pledgedSrcSize = %u, while realSrcSize >= %u",
-                    (U32)cctx->pledgedSrcSizePlusOne-1, (U32)cctx->consumedSrcSize);
+                    (unsigned)cctx->pledgedSrcSizePlusOne-1, (unsigned)cctx->consumedSrcSize);
                 return ERROR(srcSize_wrong);
             }
         }
@@ -2733,7 +2942,7 @@
                               void* dst, size_t dstCapacity,
                         const void* src, size_t srcSize)
 {
-    DEBUGLOG(5, "ZSTD_compressContinue (srcSize=%u)", (U32)srcSize);
+    DEBUGLOG(5, "ZSTD_compressContinue (srcSize=%u)", (unsigned)srcSize);
     return ZSTD_compressContinue_internal(cctx, dst, dstCapacity, src, srcSize, 1 /* frame mode */, 0 /* last chunk */);
 }
 
@@ -2791,6 +3000,7 @@
     case ZSTD_btlazy2:   /* we want the dictionary table fully sorted */
     case ZSTD_btopt:
     case ZSTD_btultra:
+    case ZSTD_btultra2:
         if (srcSize >= HASH_READ_SIZE)
             ZSTD_updateTree(ms, iend-HASH_READ_SIZE, iend);
         break;
@@ -2861,7 +3071,9 @@
         if (offcodeLog > OffFSELog) return ERROR(dictionary_corrupted);
         /* Defer checking offcodeMaxValue because we need to know the size of the dictionary content */
         /* fill all offset symbols to avoid garbage at end of table */
-        CHECK_E( FSE_buildCTable_wksp(bs->entropy.fse.offcodeCTable, offcodeNCount, MaxOff, offcodeLog, workspace, HUF_WORKSPACE_SIZE),
+        CHECK_E( FSE_buildCTable_wksp(bs->entropy.fse.offcodeCTable,
+                                    offcodeNCount, MaxOff, offcodeLog,
+                                    workspace, HUF_WORKSPACE_SIZE),
                  dictionary_corrupted);
         dictPtr += offcodeHeaderSize;
     }
@@ -2873,7 +3085,9 @@
         if (matchlengthLog > MLFSELog) return ERROR(dictionary_corrupted);
         /* Every match length code must have non-zero probability */
         CHECK_F( ZSTD_checkDictNCount(matchlengthNCount, matchlengthMaxValue, MaxML));
-        CHECK_E( FSE_buildCTable_wksp(bs->entropy.fse.matchlengthCTable, matchlengthNCount, matchlengthMaxValue, matchlengthLog, workspace, HUF_WORKSPACE_SIZE),
+        CHECK_E( FSE_buildCTable_wksp(bs->entropy.fse.matchlengthCTable,
+                                    matchlengthNCount, matchlengthMaxValue, matchlengthLog,
+                                    workspace, HUF_WORKSPACE_SIZE),
                  dictionary_corrupted);
         dictPtr += matchlengthHeaderSize;
     }
@@ -2885,7 +3099,9 @@
         if (litlengthLog > LLFSELog) return ERROR(dictionary_corrupted);
         /* Every literal length code must have non-zero probability */
         CHECK_F( ZSTD_checkDictNCount(litlengthNCount, litlengthMaxValue, MaxLL));
-        CHECK_E( FSE_buildCTable_wksp(bs->entropy.fse.litlengthCTable, litlengthNCount, litlengthMaxValue, litlengthLog, workspace, HUF_WORKSPACE_SIZE),
+        CHECK_E( FSE_buildCTable_wksp(bs->entropy.fse.litlengthCTable,
+                                    litlengthNCount, litlengthMaxValue, litlengthLog,
+                                    workspace, HUF_WORKSPACE_SIZE),
                  dictionary_corrupted);
         dictPtr += litlengthHeaderSize;
     }
@@ -3023,7 +3239,7 @@
     ZSTD_parameters const params = ZSTD_getParams(compressionLevel, ZSTD_CONTENTSIZE_UNKNOWN, dictSize);
     ZSTD_CCtx_params const cctxParams =
             ZSTD_assignParamsToCCtxParams(cctx->requestedParams, params);
-    DEBUGLOG(4, "ZSTD_compressBegin_usingDict (dictSize=%u)", (U32)dictSize);
+    DEBUGLOG(4, "ZSTD_compressBegin_usingDict (dictSize=%u)", (unsigned)dictSize);
     return ZSTD_compressBegin_internal(cctx, dict, dictSize, ZSTD_dct_auto, ZSTD_dtlm_fast, NULL,
                                        cctxParams, ZSTD_CONTENTSIZE_UNKNOWN, ZSTDb_not_buffered);
 }
@@ -3067,7 +3283,7 @@
     if (cctx->appliedParams.fParams.checksumFlag) {
         U32 const checksum = (U32) XXH64_digest(&cctx->xxhState);
         if (dstCapacity<4) return ERROR(dstSize_tooSmall);
-        DEBUGLOG(4, "ZSTD_writeEpilogue: write checksum : %08X", checksum);
+        DEBUGLOG(4, "ZSTD_writeEpilogue: write checksum : %08X", (unsigned)checksum);
         MEM_writeLE32(op, checksum);
         op += 4;
     }
@@ -3093,7 +3309,7 @@
         DEBUGLOG(4, "end of frame : controlling src size");
         if (cctx->pledgedSrcSizePlusOne != cctx->consumedSrcSize+1) {
             DEBUGLOG(4, "error : pledgedSrcSize = %u, while realSrcSize = %u",
-                (U32)cctx->pledgedSrcSizePlusOne-1, (U32)cctx->consumedSrcSize);
+                (unsigned)cctx->pledgedSrcSizePlusOne-1, (unsigned)cctx->consumedSrcSize);
             return ERROR(srcSize_wrong);
     }   }
     return cSize + endResult;
@@ -3139,7 +3355,7 @@
         const void* dict,size_t dictSize,
         ZSTD_CCtx_params params)
 {
-    DEBUGLOG(4, "ZSTD_compress_advanced_internal (srcSize:%u)", (U32)srcSize);
+    DEBUGLOG(4, "ZSTD_compress_advanced_internal (srcSize:%u)", (unsigned)srcSize);
     CHECK_F( ZSTD_compressBegin_internal(cctx,
                          dict, dictSize, ZSTD_dct_auto, ZSTD_dtlm_fast, NULL,
                          params, srcSize, ZSTDb_not_buffered) );
@@ -3163,7 +3379,7 @@
                    const void* src, size_t srcSize,
                          int compressionLevel)
 {
-    DEBUGLOG(4, "ZSTD_compressCCtx (srcSize=%u)", (U32)srcSize);
+    DEBUGLOG(4, "ZSTD_compressCCtx (srcSize=%u)", (unsigned)srcSize);
     assert(cctx != NULL);
     return ZSTD_compress_usingDict(cctx, dst, dstCapacity, src, srcSize, NULL, 0, compressionLevel);
 }
@@ -3189,7 +3405,7 @@
         size_t dictSize, ZSTD_compressionParameters cParams,
         ZSTD_dictLoadMethod_e dictLoadMethod)
 {
-    DEBUGLOG(5, "sizeof(ZSTD_CDict) : %u", (U32)sizeof(ZSTD_CDict));
+    DEBUGLOG(5, "sizeof(ZSTD_CDict) : %u", (unsigned)sizeof(ZSTD_CDict));
     return sizeof(ZSTD_CDict) + HUF_WORKSPACE_SIZE + ZSTD_sizeof_matchState(&cParams, /* forCCtx */ 0)
            + (dictLoadMethod == ZSTD_dlm_byRef ? 0 : dictSize);
 }
@@ -3203,7 +3419,7 @@
 size_t ZSTD_sizeof_CDict(const ZSTD_CDict* cdict)
 {
     if (cdict==NULL) return 0;   /* support sizeof on NULL */
-    DEBUGLOG(5, "sizeof(*cdict) : %u", (U32)sizeof(*cdict));
+    DEBUGLOG(5, "sizeof(*cdict) : %u", (unsigned)sizeof(*cdict));
     return cdict->workspaceSize + (cdict->dictBuffer ? cdict->dictContentSize : 0) + sizeof(*cdict);
 }
 
@@ -3214,7 +3430,7 @@
                     ZSTD_dictContentType_e dictContentType,
                     ZSTD_compressionParameters cParams)
 {
-    DEBUGLOG(3, "ZSTD_initCDict_internal (dictContentType:%u)", (U32)dictContentType);
+    DEBUGLOG(3, "ZSTD_initCDict_internal (dictContentType:%u)", (unsigned)dictContentType);
     assert(!ZSTD_checkCParams(cParams));
     cdict->matchState.cParams = cParams;
     if ((dictLoadMethod == ZSTD_dlm_byRef) || (!dictBuffer) || (!dictSize)) {
@@ -3264,7 +3480,7 @@
                                       ZSTD_dictContentType_e dictContentType,
                                       ZSTD_compressionParameters cParams, ZSTD_customMem customMem)
 {
-    DEBUGLOG(3, "ZSTD_createCDict_advanced, mode %u", (U32)dictContentType);
+    DEBUGLOG(3, "ZSTD_createCDict_advanced, mode %u", (unsigned)dictContentType);
     if (!customMem.customAlloc ^ !customMem.customFree) return NULL;
 
     {   ZSTD_CDict* const cdict = (ZSTD_CDict*)ZSTD_malloc(sizeof(ZSTD_CDict), customMem);
@@ -3345,7 +3561,7 @@
     void* ptr;
     if ((size_t)workspace & 7) return NULL;  /* 8-aligned */
     DEBUGLOG(4, "(workspaceSize < neededSize) : (%u < %u) => %u",
-        (U32)workspaceSize, (U32)neededSize, (U32)(workspaceSize < neededSize));
+        (unsigned)workspaceSize, (unsigned)neededSize, (unsigned)(workspaceSize < neededSize));
     if (workspaceSize < neededSize) return NULL;
 
     if (dictLoadMethod == ZSTD_dlm_byCopy) {
@@ -3505,7 +3721,7 @@
 size_t ZSTD_resetCStream(ZSTD_CStream* zcs, unsigned long long pledgedSrcSize)
 {
     ZSTD_CCtx_params params = zcs->requestedParams;
-    DEBUGLOG(4, "ZSTD_resetCStream: pledgedSrcSize = %u", (U32)pledgedSrcSize);
+    DEBUGLOG(4, "ZSTD_resetCStream: pledgedSrcSize = %u", (unsigned)pledgedSrcSize);
     if (pledgedSrcSize==0) pledgedSrcSize = ZSTD_CONTENTSIZE_UNKNOWN;
     params.fParams.contentSizeFlag = 1;
     return ZSTD_resetCStream_internal(zcs, NULL, 0, ZSTD_dct_auto, zcs->cdict, params, pledgedSrcSize);
@@ -3525,7 +3741,7 @@
     assert(!((dict) && (cdict)));  /* either dict or cdict, not both */
 
     if (dict && dictSize >= 8) {
-        DEBUGLOG(4, "loading dictionary of size %u", (U32)dictSize);
+        DEBUGLOG(4, "loading dictionary of size %u", (unsigned)dictSize);
         if (zcs->staticSize) {   /* static CCtx : never uses malloc */
             /* incompatible with internal cdict creation */
             return ERROR(memory_allocation);
@@ -3584,7 +3800,7 @@
                                  ZSTD_parameters params, unsigned long long pledgedSrcSize)
 {
     DEBUGLOG(4, "ZSTD_initCStream_advanced: pledgedSrcSize=%u, flag=%u",
-                (U32)pledgedSrcSize, params.fParams.contentSizeFlag);
+                (unsigned)pledgedSrcSize, params.fParams.contentSizeFlag);
     CHECK_F( ZSTD_checkCParams(params.cParams) );
     if ((pledgedSrcSize==0) && (params.fParams.contentSizeFlag==0)) pledgedSrcSize = ZSTD_CONTENTSIZE_UNKNOWN;  /* for compatibility with older programs relying on this behavior. Users should now specify ZSTD_CONTENTSIZE_UNKNOWN. This line will be removed in the future. */
     zcs->requestedParams = ZSTD_assignParamsToCCtxParams(zcs->requestedParams, params);
@@ -3612,8 +3828,15 @@
 
 /*======   Compression   ======*/
 
-MEM_STATIC size_t ZSTD_limitCopy(void* dst, size_t dstCapacity,
-                           const void* src, size_t srcSize)
+static size_t ZSTD_nextInputSizeHint(const ZSTD_CCtx* cctx)
+{
+    size_t hintInSize = cctx->inBuffTarget - cctx->inBuffPos;
+    if (hintInSize==0) hintInSize = cctx->blockSize;
+    return hintInSize;
+}
+
+static size_t ZSTD_limitCopy(void* dst, size_t dstCapacity,
+                       const void* src, size_t srcSize)
 {
     size_t const length = MIN(dstCapacity, srcSize);
     if (length) memcpy(dst, src, length);
@@ -3621,7 +3844,7 @@
 }
 
 /** ZSTD_compressStream_generic():
- *  internal function for all *compressStream*() variants and *compress_generic()
+ *  internal function for all *compressStream*() variants
  *  non-static, because can be called from zstdmt_compress.c
  * @return : hint size for next input */
 size_t ZSTD_compressStream_generic(ZSTD_CStream* zcs,
@@ -3638,7 +3861,7 @@
     U32 someMoreWork = 1;
 
     /* check expectations */
-    DEBUGLOG(5, "ZSTD_compressStream_generic, flush=%u", (U32)flushMode);
+    DEBUGLOG(5, "ZSTD_compressStream_generic, flush=%u", (unsigned)flushMode);
     assert(zcs->inBuff != NULL);
     assert(zcs->inBuffSize > 0);
     assert(zcs->outBuff !=  NULL);
@@ -3660,12 +3883,12 @@
                 /* shortcut to compression pass directly into output buffer */
                 size_t const cSize = ZSTD_compressEnd(zcs,
                                                 op, oend-op, ip, iend-ip);
-                DEBUGLOG(4, "ZSTD_compressEnd : %u", (U32)cSize);
+                DEBUGLOG(4, "ZSTD_compressEnd : cSize=%u", (unsigned)cSize);
                 if (ZSTD_isError(cSize)) return cSize;
                 ip = iend;
                 op += cSize;
                 zcs->frameEnded = 1;
-                ZSTD_CCtx_reset(zcs);
+                ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only);
                 someMoreWork = 0; break;
             }
             /* complete loading into inBuffer */
@@ -3709,7 +3932,7 @@
                 if (zcs->inBuffTarget > zcs->inBuffSize)
                     zcs->inBuffPos = 0, zcs->inBuffTarget = zcs->blockSize;
                 DEBUGLOG(5, "inBuffTarget:%u / inBuffSize:%u",
-                         (U32)zcs->inBuffTarget, (U32)zcs->inBuffSize);
+                         (unsigned)zcs->inBuffTarget, (unsigned)zcs->inBuffSize);
                 if (!lastBlock)
                     assert(zcs->inBuffTarget <= zcs->inBuffSize);
                 zcs->inToCompress = zcs->inBuffPos;
@@ -3718,7 +3941,7 @@
                     if (zcs->frameEnded) {
                         DEBUGLOG(5, "Frame completed directly in outBuffer");
                         someMoreWork = 0;
-                        ZSTD_CCtx_reset(zcs);
+                        ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only);
                     }
                     break;
                 }
@@ -3733,7 +3956,7 @@
                 size_t const flushed = ZSTD_limitCopy(op, oend-op,
                             zcs->outBuff + zcs->outBuffFlushedSize, toFlush);
                 DEBUGLOG(5, "toFlush: %u into %u ==> flushed: %u",
-                            (U32)toFlush, (U32)(oend-op), (U32)flushed);
+                            (unsigned)toFlush, (unsigned)(oend-op), (unsigned)flushed);
                 op += flushed;
                 zcs->outBuffFlushedSize += flushed;
                 if (toFlush!=flushed) {
@@ -3746,7 +3969,7 @@
                 if (zcs->frameEnded) {
                     DEBUGLOG(5, "Frame completed on flush");
                     someMoreWork = 0;
-                    ZSTD_CCtx_reset(zcs);
+                    ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only);
                     break;
                 }
                 zcs->streamStage = zcss_load;
@@ -3761,28 +3984,34 @@
     input->pos = ip - istart;
     output->pos = op - ostart;
     if (zcs->frameEnded) return 0;
-    {   size_t hintInSize = zcs->inBuffTarget - zcs->inBuffPos;
-        if (hintInSize==0) hintInSize = zcs->blockSize;
-        return hintInSize;
+    return ZSTD_nextInputSizeHint(zcs);
+}
+
+static size_t ZSTD_nextInputSizeHint_MTorST(const ZSTD_CCtx* cctx)
+{
+#ifdef ZSTD_MULTITHREAD
+    if (cctx->appliedParams.nbWorkers >= 1) {
+        assert(cctx->mtctx != NULL);
+        return ZSTDMT_nextInputSizeHint(cctx->mtctx);
     }
+#endif
+    return ZSTD_nextInputSizeHint(cctx);
+
 }
 
 size_t ZSTD_compressStream(ZSTD_CStream* zcs, ZSTD_outBuffer* output, ZSTD_inBuffer* input)
 {
-    /* check conditions */
-    if (output->pos > output->size) return ERROR(GENERIC);
-    if (input->pos  > input->size)  return ERROR(GENERIC);
-
-    return ZSTD_compressStream_generic(zcs, output, input, ZSTD_e_continue);
+    CHECK_F( ZSTD_compressStream2(zcs, output, input, ZSTD_e_continue) );
+    return ZSTD_nextInputSizeHint_MTorST(zcs);
 }
 
 
-size_t ZSTD_compress_generic (ZSTD_CCtx* cctx,
-                              ZSTD_outBuffer* output,
-                              ZSTD_inBuffer* input,
-                              ZSTD_EndDirective endOp)
+size_t ZSTD_compressStream2( ZSTD_CCtx* cctx,
+                             ZSTD_outBuffer* output,
+                             ZSTD_inBuffer* input,
+                             ZSTD_EndDirective endOp)
 {
-    DEBUGLOG(5, "ZSTD_compress_generic, endOp=%u ", (U32)endOp);
+    DEBUGLOG(5, "ZSTD_compressStream2, endOp=%u ", (unsigned)endOp);
     /* check conditions */
     if (output->pos > output->size) return ERROR(GENERIC);
     if (input->pos  > input->size)  return ERROR(GENERIC);
@@ -3792,9 +4021,9 @@
     if (cctx->streamStage == zcss_init) {
         ZSTD_CCtx_params params = cctx->requestedParams;
         ZSTD_prefixDict const prefixDict = cctx->prefixDict;
-        memset(&cctx->prefixDict, 0, sizeof(cctx->prefixDict));  /* single usage */
-        assert(prefixDict.dict==NULL || cctx->cdict==NULL);   /* only one can be set */
-        DEBUGLOG(4, "ZSTD_compress_generic : transparent init stage");
+        memset(&cctx->prefixDict, 0, sizeof(cctx->prefixDict));   /* single usage */
+        assert(prefixDict.dict==NULL || cctx->cdict==NULL);    /* only one can be set */
+        DEBUGLOG(4, "ZSTD_compressStream2 : transparent init stage");
         if (endOp == ZSTD_e_end) cctx->pledgedSrcSizePlusOne = input->size + 1;  /* auto-fix pledgedSrcSize */
         params.cParams = ZSTD_getCParamsFromCCtxParams(
                 &cctx->requestedParams, cctx->pledgedSrcSizePlusOne-1, 0 /*dictSize*/);
@@ -3807,7 +4036,7 @@
         if (params.nbWorkers > 0) {
             /* mt context creation */
             if (cctx->mtctx == NULL) {
-                DEBUGLOG(4, "ZSTD_compress_generic: creating new mtctx for nbWorkers=%u",
+                DEBUGLOG(4, "ZSTD_compressStream2: creating new mtctx for nbWorkers=%u",
                             params.nbWorkers);
                 cctx->mtctx = ZSTDMT_createCCtx_advanced(params.nbWorkers, cctx->customMem);
                 if (cctx->mtctx == NULL) return ERROR(memory_allocation);
@@ -3829,6 +4058,7 @@
             assert(cctx->streamStage == zcss_load);
             assert(cctx->appliedParams.nbWorkers == 0);
     }   }
+    /* end of transparent initialization stage */
 
     /* compression stage */
 #ifdef ZSTD_MULTITHREAD
@@ -3840,18 +4070,18 @@
         {   size_t const flushMin = ZSTDMT_compressStream_generic(cctx->mtctx, output, input, endOp);
             if ( ZSTD_isError(flushMin)
               || (endOp == ZSTD_e_end && flushMin == 0) ) { /* compression completed */
-                ZSTD_CCtx_reset(cctx);
+                ZSTD_CCtx_reset(cctx, ZSTD_reset_session_only);
             }
-            DEBUGLOG(5, "completed ZSTD_compress_generic delegating to ZSTDMT_compressStream_generic");
+            DEBUGLOG(5, "completed ZSTD_compressStream2 delegating to ZSTDMT_compressStream_generic");
             return flushMin;
     }   }
 #endif
     CHECK_F( ZSTD_compressStream_generic(cctx, output, input, endOp) );
-    DEBUGLOG(5, "completed ZSTD_compress_generic");
+    DEBUGLOG(5, "completed ZSTD_compressStream2");
     return cctx->outBuffContentSize - cctx->outBuffFlushedSize; /* remaining to flush */
 }
 
-size_t ZSTD_compress_generic_simpleArgs (
+size_t ZSTD_compressStream2_simpleArgs (
                             ZSTD_CCtx* cctx,
                             void* dst, size_t dstCapacity, size_t* dstPos,
                       const void* src, size_t srcSize, size_t* srcPos,
@@ -3859,13 +4089,33 @@
 {
     ZSTD_outBuffer output = { dst, dstCapacity, *dstPos };
     ZSTD_inBuffer  input  = { src, srcSize, *srcPos };
-    /* ZSTD_compress_generic() will check validity of dstPos and srcPos */
-    size_t const cErr = ZSTD_compress_generic(cctx, &output, &input, endOp);
+    /* ZSTD_compressStream2() will check validity of dstPos and srcPos */
+    size_t const cErr = ZSTD_compressStream2(cctx, &output, &input, endOp);
     *dstPos = output.pos;
     *srcPos = input.pos;
     return cErr;
 }
 
+size_t ZSTD_compress2(ZSTD_CCtx* cctx,
+                      void* dst, size_t dstCapacity,
+                      const void* src, size_t srcSize)
+{
+    ZSTD_CCtx_reset(cctx, ZSTD_reset_session_only);
+    {   size_t oPos = 0;
+        size_t iPos = 0;
+        size_t const result = ZSTD_compressStream2_simpleArgs(cctx,
+                                        dst, dstCapacity, &oPos,
+                                        src, srcSize, &iPos,
+                                        ZSTD_e_end);
+        if (ZSTD_isError(result)) return result;
+        if (result != 0) {  /* compression not completed, due to lack of output space */
+            assert(oPos == dstCapacity);
+            return ERROR(dstSize_tooSmall);
+        }
+        assert(iPos == srcSize);   /* all input is expected consumed */
+        return oPos;
+    }
+}
 
 /*======   Finalize   ======*/
 
@@ -3874,21 +4124,21 @@
 size_t ZSTD_flushStream(ZSTD_CStream* zcs, ZSTD_outBuffer* output)
 {
     ZSTD_inBuffer input = { NULL, 0, 0 };
-    if (output->pos > output->size) return ERROR(GENERIC);
-    CHECK_F( ZSTD_compressStream_generic(zcs, output, &input, ZSTD_e_flush) );
-    return zcs->outBuffContentSize - zcs->outBuffFlushedSize;  /* remaining to flush */
+    return ZSTD_compressStream2(zcs, output, &input, ZSTD_e_flush);
 }
 
 
 size_t ZSTD_endStream(ZSTD_CStream* zcs, ZSTD_outBuffer* output)
 {
     ZSTD_inBuffer input = { NULL, 0, 0 };
-    if (output->pos > output->size) return ERROR(GENERIC);
-    CHECK_F( ZSTD_compressStream_generic(zcs, output, &input, ZSTD_e_end) );
+    size_t const remainingToFlush = ZSTD_compressStream2(zcs, output, &input, ZSTD_e_end);
+    CHECK_F( remainingToFlush );
+    if (zcs->appliedParams.nbWorkers > 0) return remainingToFlush;   /* minimal estimation */
+    /* single thread mode : attempt to calculate remaining to flush more precisely */
     {   size_t const lastBlockSize = zcs->frameEnded ? 0 : ZSTD_BLOCKHEADERSIZE;
         size_t const checksumSize = zcs->frameEnded ? 0 : zcs->appliedParams.fParams.checksumFlag * 4;
-        size_t const toFlush = zcs->outBuffContentSize - zcs->outBuffFlushedSize + lastBlockSize + checksumSize;
-        DEBUGLOG(4, "ZSTD_endStream : remaining to flush : %u", (U32)toFlush);
+        size_t const toFlush = remainingToFlush + lastBlockSize + checksumSize;
+        DEBUGLOG(4, "ZSTD_endStream : remaining to flush : %u", (unsigned)toFlush);
         return toFlush;
     }
 }
@@ -3905,27 +4155,27 @@
     /* W,  C,  H,  S,  L, TL, strat */
     { 19, 12, 13,  1,  6,  1, ZSTD_fast    },  /* base for negative levels */
     { 19, 13, 14,  1,  7,  0, ZSTD_fast    },  /* level  1 */
-    { 19, 15, 16,  1,  6,  0, ZSTD_fast    },  /* level  2 */
-    { 20, 16, 17,  1,  5,  1, ZSTD_dfast   },  /* level  3 */
-    { 20, 18, 18,  1,  5,  1, ZSTD_dfast   },  /* level  4 */
-    { 20, 18, 18,  2,  5,  2, ZSTD_greedy  },  /* level  5 */
-    { 21, 18, 19,  2,  5,  4, ZSTD_lazy    },  /* level  6 */
-    { 21, 18, 19,  3,  5,  8, ZSTD_lazy2   },  /* level  7 */
+    { 20, 15, 16,  1,  6,  0, ZSTD_fast    },  /* level  2 */
+    { 21, 16, 17,  1,  5,  1, ZSTD_dfast   },  /* level  3 */
+    { 21, 18, 18,  1,  5,  1, ZSTD_dfast   },  /* level  4 */
+    { 21, 18, 19,  2,  5,  2, ZSTD_greedy  },  /* level  5 */
+    { 21, 19, 19,  3,  5,  4, ZSTD_greedy  },  /* level  6 */
+    { 21, 19, 19,  3,  5,  8, ZSTD_lazy    },  /* level  7 */
     { 21, 19, 19,  3,  5, 16, ZSTD_lazy2   },  /* level  8 */
     { 21, 19, 20,  4,  5, 16, ZSTD_lazy2   },  /* level  9 */
-    { 21, 20, 21,  4,  5, 16, ZSTD_lazy2   },  /* level 10 */
-    { 21, 21, 22,  4,  5, 16, ZSTD_lazy2   },  /* level 11 */
-    { 22, 20, 22,  5,  5, 16, ZSTD_lazy2   },  /* level 12 */
-    { 22, 21, 22,  4,  5, 32, ZSTD_btlazy2 },  /* level 13 */
-    { 22, 21, 22,  5,  5, 32, ZSTD_btlazy2 },  /* level 14 */
-    { 22, 22, 22,  6,  5, 32, ZSTD_btlazy2 },  /* level 15 */
-    { 22, 21, 22,  4,  5, 48, ZSTD_btopt   },  /* level 16 */
-    { 23, 22, 22,  4,  4, 64, ZSTD_btopt   },  /* level 17 */
-    { 23, 23, 22,  6,  3,256, ZSTD_btopt   },  /* level 18 */
-    { 23, 24, 22,  7,  3,256, ZSTD_btultra },  /* level 19 */
-    { 25, 25, 23,  7,  3,256, ZSTD_btultra },  /* level 20 */
-    { 26, 26, 24,  7,  3,512, ZSTD_btultra },  /* level 21 */
-    { 27, 27, 25,  9,  3,999, ZSTD_btultra },  /* level 22 */
+    { 22, 20, 21,  4,  5, 16, ZSTD_lazy2   },  /* level 10 */
+    { 22, 21, 22,  4,  5, 16, ZSTD_lazy2   },  /* level 11 */
+    { 22, 21, 22,  5,  5, 16, ZSTD_lazy2   },  /* level 12 */
+    { 22, 21, 22,  5,  5, 32, ZSTD_btlazy2 },  /* level 13 */
+    { 22, 22, 23,  5,  5, 32, ZSTD_btlazy2 },  /* level 14 */
+    { 22, 23, 23,  6,  5, 32, ZSTD_btlazy2 },  /* level 15 */
+    { 22, 22, 22,  5,  5, 48, ZSTD_btopt   },  /* level 16 */
+    { 23, 23, 22,  5,  4, 64, ZSTD_btopt   },  /* level 17 */
+    { 23, 23, 22,  6,  3, 64, ZSTD_btultra },  /* level 18 */
+    { 23, 24, 22,  7,  3,256, ZSTD_btultra2},  /* level 19 */
+    { 25, 25, 23,  7,  3,256, ZSTD_btultra2},  /* level 20 */
+    { 26, 26, 24,  7,  3,512, ZSTD_btultra2},  /* level 21 */
+    { 27, 27, 25,  9,  3,999, ZSTD_btultra2},  /* level 22 */
 },
 {   /* for srcSize <= 256 KB */
     /* W,  C,  H,  S,  L,  T, strat */
@@ -3940,18 +4190,18 @@
     { 18, 18, 19,  4,  4,  8, ZSTD_lazy2   },  /* level  8 */
     { 18, 18, 19,  5,  4,  8, ZSTD_lazy2   },  /* level  9 */
     { 18, 18, 19,  6,  4,  8, ZSTD_lazy2   },  /* level 10 */
-    { 18, 18, 19,  5,  4, 16, ZSTD_btlazy2 },  /* level 11.*/
-    { 18, 19, 19,  6,  4, 16, ZSTD_btlazy2 },  /* level 12.*/
-    { 18, 19, 19,  8,  4, 16, ZSTD_btlazy2 },  /* level 13 */
-    { 18, 18, 19,  4,  4, 24, ZSTD_btopt   },  /* level 14.*/
-    { 18, 18, 19,  4,  3, 24, ZSTD_btopt   },  /* level 15.*/
-    { 18, 19, 19,  6,  3, 64, ZSTD_btopt   },  /* level 16.*/
-    { 18, 19, 19,  8,  3,128, ZSTD_btopt   },  /* level 17.*/
-    { 18, 19, 19, 10,  3,256, ZSTD_btopt   },  /* level 18.*/
-    { 18, 19, 19, 10,  3,256, ZSTD_btultra },  /* level 19.*/
-    { 18, 19, 19, 11,  3,512, ZSTD_btultra },  /* level 20.*/
-    { 18, 19, 19, 12,  3,512, ZSTD_btultra },  /* level 21.*/
-    { 18, 19, 19, 13,  3,999, ZSTD_btultra },  /* level 22.*/
+    { 18, 18, 19,  5,  4, 12, ZSTD_btlazy2 },  /* level 11.*/
+    { 18, 19, 19,  7,  4, 12, ZSTD_btlazy2 },  /* level 12.*/
+    { 18, 18, 19,  4,  4, 16, ZSTD_btopt   },  /* level 13 */
+    { 18, 18, 19,  4,  3, 32, ZSTD_btopt   },  /* level 14.*/
+    { 18, 18, 19,  6,  3,128, ZSTD_btopt   },  /* level 15.*/
+    { 18, 19, 19,  6,  3,128, ZSTD_btultra },  /* level 16.*/
+    { 18, 19, 19,  8,  3,256, ZSTD_btultra },  /* level 17.*/
+    { 18, 19, 19,  6,  3,128, ZSTD_btultra2},  /* level 18.*/
+    { 18, 19, 19,  8,  3,256, ZSTD_btultra2},  /* level 19.*/
+    { 18, 19, 19, 10,  3,512, ZSTD_btultra2},  /* level 20.*/
+    { 18, 19, 19, 12,  3,512, ZSTD_btultra2},  /* level 21.*/
+    { 18, 19, 19, 13,  3,999, ZSTD_btultra2},  /* level 22.*/
 },
 {   /* for srcSize <= 128 KB */
     /* W,  C,  H,  S,  L,  T, strat */
@@ -3966,26 +4216,26 @@
     { 17, 17, 17,  4,  4,  8, ZSTD_lazy2   },  /* level  8 */
     { 17, 17, 17,  5,  4,  8, ZSTD_lazy2   },  /* level  9 */
     { 17, 17, 17,  6,  4,  8, ZSTD_lazy2   },  /* level 10 */
-    { 17, 17, 17,  7,  4,  8, ZSTD_lazy2   },  /* level 11 */
-    { 17, 18, 17,  6,  4, 16, ZSTD_btlazy2 },  /* level 12 */
-    { 17, 18, 17,  8,  4, 16, ZSTD_btlazy2 },  /* level 13.*/
-    { 17, 18, 17,  4,  4, 32, ZSTD_btopt   },  /* level 14.*/
-    { 17, 18, 17,  6,  3, 64, ZSTD_btopt   },  /* level 15.*/
-    { 17, 18, 17,  7,  3,128, ZSTD_btopt   },  /* level 16.*/
-    { 17, 18, 17,  7,  3,256, ZSTD_btopt   },  /* level 17.*/
-    { 17, 18, 17,  8,  3,256, ZSTD_btopt   },  /* level 18.*/
-    { 17, 18, 17,  8,  3,256, ZSTD_btultra },  /* level 19.*/
-    { 17, 18, 17,  9,  3,256, ZSTD_btultra },  /* level 20.*/
-    { 17, 18, 17, 10,  3,256, ZSTD_btultra },  /* level 21.*/
-    { 17, 18, 17, 11,  3,512, ZSTD_btultra },  /* level 22.*/
+    { 17, 17, 17,  5,  4,  8, ZSTD_btlazy2 },  /* level 11 */
+    { 17, 18, 17,  7,  4, 12, ZSTD_btlazy2 },  /* level 12 */
+    { 17, 18, 17,  3,  4, 12, ZSTD_btopt   },  /* level 13.*/
+    { 17, 18, 17,  4,  3, 32, ZSTD_btopt   },  /* level 14.*/
+    { 17, 18, 17,  6,  3,256, ZSTD_btopt   },  /* level 15.*/
+    { 17, 18, 17,  6,  3,128, ZSTD_btultra },  /* level 16.*/
+    { 17, 18, 17,  8,  3,256, ZSTD_btultra },  /* level 17.*/
+    { 17, 18, 17, 10,  3,512, ZSTD_btultra },  /* level 18.*/
+    { 17, 18, 17,  5,  3,256, ZSTD_btultra2},  /* level 19.*/
+    { 17, 18, 17,  7,  3,512, ZSTD_btultra2},  /* level 20.*/
+    { 17, 18, 17,  9,  3,512, ZSTD_btultra2},  /* level 21.*/
+    { 17, 18, 17, 11,  3,999, ZSTD_btultra2},  /* level 22.*/
 },
 {   /* for srcSize <= 16 KB */
     /* W,  C,  H,  S,  L,  T, strat */
     { 14, 12, 13,  1,  5,  1, ZSTD_fast    },  /* base for negative levels */
     { 14, 14, 15,  1,  5,  0, ZSTD_fast    },  /* level  1 */
     { 14, 14, 15,  1,  4,  0, ZSTD_fast    },  /* level  2 */
-    { 14, 14, 14,  2,  4,  1, ZSTD_dfast   },  /* level  3.*/
-    { 14, 14, 14,  4,  4,  2, ZSTD_greedy  },  /* level  4.*/
+    { 14, 14, 15,  2,  4,  1, ZSTD_dfast   },  /* level  3 */
+    { 14, 14, 14,  4,  4,  2, ZSTD_greedy  },  /* level  4 */
     { 14, 14, 14,  3,  4,  4, ZSTD_lazy    },  /* level  5.*/
     { 14, 14, 14,  4,  4,  8, ZSTD_lazy2   },  /* level  6 */
     { 14, 14, 14,  6,  4,  8, ZSTD_lazy2   },  /* level  7 */
@@ -3993,17 +4243,17 @@
     { 14, 15, 14,  5,  4,  8, ZSTD_btlazy2 },  /* level  9.*/
     { 14, 15, 14,  9,  4,  8, ZSTD_btlazy2 },  /* level 10.*/
     { 14, 15, 14,  3,  4, 12, ZSTD_btopt   },  /* level 11.*/
-    { 14, 15, 14,  6,  3, 16, ZSTD_btopt   },  /* level 12.*/
-    { 14, 15, 14,  6,  3, 24, ZSTD_btopt   },  /* level 13.*/
-    { 14, 15, 15,  6,  3, 48, ZSTD_btopt   },  /* level 14.*/
-    { 14, 15, 15,  6,  3, 64, ZSTD_btopt   },  /* level 15.*/
-    { 14, 15, 15,  6,  3, 96, ZSTD_btopt   },  /* level 16.*/
-    { 14, 15, 15,  6,  3,128, ZSTD_btopt   },  /* level 17.*/
-    { 14, 15, 15,  8,  3,256, ZSTD_btopt   },  /* level 18.*/
-    { 14, 15, 15,  6,  3,256, ZSTD_btultra },  /* level 19.*/
-    { 14, 15, 15,  8,  3,256, ZSTD_btultra },  /* level 20.*/
-    { 14, 15, 15,  9,  3,256, ZSTD_btultra },  /* level 21.*/
-    { 14, 15, 15, 10,  3,512, ZSTD_btultra },  /* level 22.*/
+    { 14, 15, 14,  4,  3, 24, ZSTD_btopt   },  /* level 12.*/
+    { 14, 15, 14,  5,  3, 32, ZSTD_btultra },  /* level 13.*/
+    { 14, 15, 15,  6,  3, 64, ZSTD_btultra },  /* level 14.*/
+    { 14, 15, 15,  7,  3,256, ZSTD_btultra },  /* level 15.*/
+    { 14, 15, 15,  5,  3, 48, ZSTD_btultra2},  /* level 16.*/
+    { 14, 15, 15,  6,  3,128, ZSTD_btultra2},  /* level 17.*/
+    { 14, 15, 15,  7,  3,256, ZSTD_btultra2},  /* level 18.*/
+    { 14, 15, 15,  8,  3,256, ZSTD_btultra2},  /* level 19.*/
+    { 14, 15, 15,  8,  3,512, ZSTD_btultra2},  /* level 20.*/
+    { 14, 15, 15,  9,  3,512, ZSTD_btultra2},  /* level 21.*/
+    { 14, 15, 15, 10,  3,999, ZSTD_btultra2},  /* level 22.*/
 },
 };
 
@@ -4022,8 +4272,8 @@
     if (compressionLevel > ZSTD_MAX_CLEVEL) row = ZSTD_MAX_CLEVEL;
     {   ZSTD_compressionParameters cp = ZSTD_defaultCParameters[tableID][row];
         if (compressionLevel < 0) cp.targetLength = (unsigned)(-compressionLevel);   /* acceleration factor */
-        return ZSTD_adjustCParams_internal(cp, srcSizeHint, dictSize); }
-
+        return ZSTD_adjustCParams_internal(cp, srcSizeHint, dictSize);
+    }
 }
 
 /*! ZSTD_getParams() :
diff --git a/contrib/python-zstandard/zstd/compress/zstd_compress_internal.h b/contrib/python-zstandard/zstd/compress/zstd_compress_internal.h
--- a/contrib/python-zstandard/zstd/compress/zstd_compress_internal.h
+++ b/contrib/python-zstandard/zstd/compress/zstd_compress_internal.h
@@ -48,12 +48,6 @@
 typedef enum { ZSTDcs_created=0, ZSTDcs_init, ZSTDcs_ongoing, ZSTDcs_ending } ZSTD_compressionStage_e;
 typedef enum { zcss_init=0, zcss_load, zcss_flush } ZSTD_cStreamStage;
 
-typedef enum {
-    ZSTD_dictDefaultAttach = 0,
-    ZSTD_dictForceAttach = 1,
-    ZSTD_dictForceCopy = -1,
-} ZSTD_dictAttachPref_e;
-
 typedef struct ZSTD_prefixDict_s {
     const void* dict;
     size_t dictSize;
@@ -96,10 +90,10 @@
 
 typedef struct {
     /* All tables are allocated inside cctx->workspace by ZSTD_resetCCtx_internal() */
-    U32* litFreq;                /* table of literals statistics, of size 256 */
-    U32* litLengthFreq;          /* table of litLength statistics, of size (MaxLL+1) */
-    U32* matchLengthFreq;        /* table of matchLength statistics, of size (MaxML+1) */
-    U32* offCodeFreq;            /* table of offCode statistics, of size (MaxOff+1) */
+    unsigned* litFreq;           /* table of literals statistics, of size 256 */
+    unsigned* litLengthFreq;     /* table of litLength statistics, of size (MaxLL+1) */
+    unsigned* matchLengthFreq;   /* table of matchLength statistics, of size (MaxML+1) */
+    unsigned* offCodeFreq;       /* table of offCode statistics, of size (MaxOff+1) */
     ZSTD_match_t* matchTable;    /* list of found matches, of size ZSTD_OPT_NUM+1 */
     ZSTD_optimal_t* priceTable;  /* All positions tracked by optimal parser, of size ZSTD_OPT_NUM+1 */
 
@@ -139,7 +133,7 @@
     U32* hashTable3;
     U32* chainTable;
     optState_t opt;         /* optimal parser state */
-    const ZSTD_matchState_t *dictMatchState;
+    const ZSTD_matchState_t * dictMatchState;
     ZSTD_compressionParameters cParams;
 };
 
@@ -167,7 +161,7 @@
     U32 hashLog;            /* Log size of hashTable */
     U32 bucketSizeLog;      /* Log bucket size for collision resolution, at most 8 */
     U32 minMatchLength;     /* Minimum match length */
-    U32 hashEveryLog;       /* Log number of entries to skip */
+    U32 hashRateLog;       /* Log number of entries to skip */
     U32 windowLog;          /* Window log for the LDM */
 } ldmParams_t;
 
@@ -196,9 +190,10 @@
     ZSTD_dictAttachPref_e attachDictPref;
 
     /* Multithreading: used to pass parameters to mtctx */
-    unsigned nbWorkers;
-    unsigned jobSize;
-    unsigned overlapSizeLog;
+    int nbWorkers;
+    size_t jobSize;
+    int overlapLog;
+    int rsyncable;
 
     /* Long distance matching parameters */
     ldmParams_t ldmParams;
@@ -498,6 +493,64 @@
     }
 }
 
+/** ZSTD_ipow() :
+ * Return base^exponent.
+ */
+static U64 ZSTD_ipow(U64 base, U64 exponent)
+{
+    U64 power = 1;
+    while (exponent) {
+      if (exponent & 1) power *= base;
+      exponent >>= 1;
+      base *= base;
+    }
+    return power;
+}
+
+#define ZSTD_ROLL_HASH_CHAR_OFFSET 10
+
+/** ZSTD_rollingHash_append() :
+ * Add the buffer to the hash value.
+ */
+static U64 ZSTD_rollingHash_append(U64 hash, void const* buf, size_t size)
+{
+    BYTE const* istart = (BYTE const*)buf;
+    size_t pos;
+    for (pos = 0; pos < size; ++pos) {
+        hash *= prime8bytes;
+        hash += istart[pos] + ZSTD_ROLL_HASH_CHAR_OFFSET;
+    }
+    return hash;
+}
+
+/** ZSTD_rollingHash_compute() :
+ * Compute the rolling hash value of the buffer.
+ */
+MEM_STATIC U64 ZSTD_rollingHash_compute(void const* buf, size_t size)
+{
+    return ZSTD_rollingHash_append(0, buf, size);
+}
+
+/** ZSTD_rollingHash_primePower() :
+ * Compute the primePower to be passed to ZSTD_rollingHash_rotate() for a hash
+ * over a window of length bytes.
+ */
+MEM_STATIC U64 ZSTD_rollingHash_primePower(U32 length)
+{
+    return ZSTD_ipow(prime8bytes, length - 1);
+}
+
+/** ZSTD_rollingHash_rotate() :
+ * Rotate the rolling hash by one byte.
+ */
+MEM_STATIC U64 ZSTD_rollingHash_rotate(U64 hash, BYTE toRemove, BYTE toAdd, U64 primePower)
+{
+    hash -= (toRemove + ZSTD_ROLL_HASH_CHAR_OFFSET) * primePower;
+    hash *= prime8bytes;
+    hash += toAdd + ZSTD_ROLL_HASH_CHAR_OFFSET;
+    return hash;
+}
+
 /*-*************************************
 *  Round buffer management
 ***************************************/
@@ -626,20 +679,23 @@
  * dictMatchState mode, lowLimit and dictLimit are the same, and the dictionary
  * is below them. forceWindow and dictMatchState are therefore incompatible.
  */
-MEM_STATIC void ZSTD_window_enforceMaxDist(ZSTD_window_t* window,
-                                           void const* srcEnd, U32 maxDist,
-                                           U32* loadedDictEndPtr,
-                                           const ZSTD_matchState_t** dictMatchStatePtr)
+MEM_STATIC void
+ZSTD_window_enforceMaxDist(ZSTD_window_t* window,
+                           void const* srcEnd,
+                           U32 maxDist,
+                           U32* loadedDictEndPtr,
+                     const ZSTD_matchState_t** dictMatchStatePtr)
 {
-    U32 const current = (U32)((BYTE const*)srcEnd - window->base);
-    U32 loadedDictEnd = loadedDictEndPtr != NULL ? *loadedDictEndPtr : 0;
-    DEBUGLOG(5, "ZSTD_window_enforceMaxDist: current=%u, maxDist=%u", current, maxDist);
-    if (current > maxDist + loadedDictEnd) {
-        U32 const newLowLimit = current - maxDist;
+    U32 const blockEndIdx = (U32)((BYTE const*)srcEnd - window->base);
+    U32 loadedDictEnd = (loadedDictEndPtr != NULL) ? *loadedDictEndPtr : 0;
+    DEBUGLOG(5, "ZSTD_window_enforceMaxDist: blockEndIdx=%u, maxDist=%u",
+                (unsigned)blockEndIdx, (unsigned)maxDist);
+    if (blockEndIdx > maxDist + loadedDictEnd) {
+        U32 const newLowLimit = blockEndIdx - maxDist;
         if (window->lowLimit < newLowLimit) window->lowLimit = newLowLimit;
         if (window->dictLimit < window->lowLimit) {
             DEBUGLOG(5, "Update dictLimit to match lowLimit, from %u to %u",
-                        window->dictLimit, window->lowLimit);
+                        (unsigned)window->dictLimit, (unsigned)window->lowLimit);
             window->dictLimit = window->lowLimit;
         }
         if (loadedDictEndPtr)
@@ -690,20 +746,23 @@
 
 
 /* debug functions */
+#if (DEBUGLEVEL>=2)
 
 MEM_STATIC double ZSTD_fWeight(U32 rawStat)
 {
     U32 const fp_accuracy = 8;
     U32 const fp_multiplier = (1 << fp_accuracy);
-    U32 const stat = rawStat + 1;
-    U32 const hb = ZSTD_highbit32(stat);
+    U32 const newStat = rawStat + 1;
+    U32 const hb = ZSTD_highbit32(newStat);
     U32 const BWeight = hb * fp_multiplier;
-    U32 const FWeight = (stat << fp_accuracy) >> hb;
+    U32 const FWeight = (newStat << fp_accuracy) >> hb;
     U32 const weight = BWeight + FWeight;
     assert(hb + fp_accuracy < 31);
     return (double)weight / fp_multiplier;
 }
 
+/* display a table content,
+ * listing each element, its frequency, and its predicted bit cost */
 MEM_STATIC void ZSTD_debugTable(const U32* table, U32 max)
 {
     unsigned u, sum;
@@ -715,6 +774,9 @@
     }
 }
 
+#endif
+
+
 #if defined (__cplusplus)
 }
 #endif
diff --git a/contrib/python-zstandard/zstd/compress/zstd_double_fast.c b/contrib/python-zstandard/zstd/compress/zstd_double_fast.c
--- a/contrib/python-zstandard/zstd/compress/zstd_double_fast.c
+++ b/contrib/python-zstandard/zstd/compress/zstd_double_fast.c
@@ -18,7 +18,7 @@
     const ZSTD_compressionParameters* const cParams = &ms->cParams;
     U32* const hashLarge = ms->hashTable;
     U32  const hBitsL = cParams->hashLog;
-    U32  const mls = cParams->searchLength;
+    U32  const mls = cParams->minMatch;
     U32* const hashSmall = ms->chainTable;
     U32  const hBitsS = cParams->chainLog;
     const BYTE* const base = ms->window.base;
@@ -309,7 +309,7 @@
         ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    const U32 mls = ms->cParams.searchLength;
+    const U32 mls = ms->cParams.minMatch;
     switch(mls)
     {
     default: /* includes case 3 */
@@ -329,7 +329,7 @@
         ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    const U32 mls = ms->cParams.searchLength;
+    const U32 mls = ms->cParams.minMatch;
     switch(mls)
     {
     default: /* includes case 3 */
@@ -483,7 +483,7 @@
         ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    U32 const mls = ms->cParams.searchLength;
+    U32 const mls = ms->cParams.minMatch;
     switch(mls)
     {
     default: /* includes case 3 */
diff --git a/contrib/python-zstandard/zstd/compress/zstd_fast.c b/contrib/python-zstandard/zstd/compress/zstd_fast.c
--- a/contrib/python-zstandard/zstd/compress/zstd_fast.c
+++ b/contrib/python-zstandard/zstd/compress/zstd_fast.c
@@ -18,7 +18,7 @@
     const ZSTD_compressionParameters* const cParams = &ms->cParams;
     U32* const hashTable = ms->hashTable;
     U32  const hBits = cParams->hashLog;
-    U32  const mls = cParams->searchLength;
+    U32  const mls = cParams->minMatch;
     const BYTE* const base = ms->window.base;
     const BYTE* ip = base + ms->nextToUpdate;
     const BYTE* const iend = ((const BYTE*)end) - HASH_READ_SIZE;
@@ -27,18 +27,18 @@
     /* Always insert every fastHashFillStep position into the hash table.
      * Insert the other positions if their hash entry is empty.
      */
-    for (; ip + fastHashFillStep - 1 <= iend; ip += fastHashFillStep) {
+    for ( ; ip + fastHashFillStep < iend + 2; ip += fastHashFillStep) {
         U32 const current = (U32)(ip - base);
-        U32 i;
-        for (i = 0; i < fastHashFillStep; ++i) {
-            size_t const hash = ZSTD_hashPtr(ip + i, hBits, mls);
-            if (i == 0 || hashTable[hash] == 0)
-                hashTable[hash] = current + i;
-            /* Only load extra positions for ZSTD_dtlm_full */
-            if (dtlm == ZSTD_dtlm_fast)
-                break;
-        }
-    }
+        size_t const hash0 = ZSTD_hashPtr(ip, hBits, mls);
+        hashTable[hash0] = current;
+        if (dtlm == ZSTD_dtlm_fast) continue;
+        /* Only load extra positions for ZSTD_dtlm_full */
+        {   U32 p;
+            for (p = 1; p < fastHashFillStep; ++p) {
+                size_t const hash = ZSTD_hashPtr(ip + p, hBits, mls);
+                if (hashTable[hash] == 0) {  /* not yet filled */
+                    hashTable[hash] = current + p;
+    }   }   }   }
 }
 
 FORCE_INLINE_TEMPLATE
@@ -235,7 +235,7 @@
         void const* src, size_t srcSize)
 {
     ZSTD_compressionParameters const* cParams = &ms->cParams;
-    U32 const mls = cParams->searchLength;
+    U32 const mls = cParams->minMatch;
     assert(ms->dictMatchState == NULL);
     switch(mls)
     {
@@ -256,7 +256,7 @@
         void const* src, size_t srcSize)
 {
     ZSTD_compressionParameters const* cParams = &ms->cParams;
-    U32 const mls = cParams->searchLength;
+    U32 const mls = cParams->minMatch;
     assert(ms->dictMatchState != NULL);
     switch(mls)
     {
@@ -375,7 +375,7 @@
         void const* src, size_t srcSize)
 {
     ZSTD_compressionParameters const* cParams = &ms->cParams;
-    U32 const mls = cParams->searchLength;
+    U32 const mls = cParams->minMatch;
     switch(mls)
     {
     default: /* includes case 3 */
diff --git a/contrib/python-zstandard/zstd/compress/zstd_lazy.c b/contrib/python-zstandard/zstd/compress/zstd_lazy.c
--- a/contrib/python-zstandard/zstd/compress/zstd_lazy.c
+++ b/contrib/python-zstandard/zstd/compress/zstd_lazy.c
@@ -63,12 +63,13 @@
 static void
 ZSTD_insertDUBT1(ZSTD_matchState_t* ms,
                  U32 current, const BYTE* inputEnd,
-                 U32 nbCompares, U32 btLow, const ZSTD_dictMode_e dictMode)
+                 U32 nbCompares, U32 btLow,
+                 const ZSTD_dictMode_e dictMode)
 {
     const ZSTD_compressionParameters* const cParams = &ms->cParams;
-    U32*   const bt = ms->chainTable;
-    U32    const btLog  = cParams->chainLog - 1;
-    U32    const btMask = (1 << btLog) - 1;
+    U32* const bt = ms->chainTable;
+    U32  const btLog  = cParams->chainLog - 1;
+    U32  const btMask = (1 << btLog) - 1;
     size_t commonLengthSmaller=0, commonLengthLarger=0;
     const BYTE* const base = ms->window.base;
     const BYTE* const dictBase = ms->window.dictBase;
@@ -80,7 +81,7 @@
     const BYTE* match;
     U32* smallerPtr = bt + 2*(current&btMask);
     U32* largerPtr  = smallerPtr + 1;
-    U32 matchIndex = *smallerPtr;
+    U32 matchIndex = *smallerPtr;   /* this candidate is unsorted : next sorted candidate is reached through *smallerPtr, while *largerPtr contains previous unsorted candidate (which is already saved and can be overwritten) */
     U32 dummy32;   /* to be nullified at the end */
     U32 const windowLow = ms->window.lowLimit;
 
@@ -93,6 +94,9 @@
         U32* const nextPtr = bt + 2*(matchIndex & btMask);
         size_t matchLength = MIN(commonLengthSmaller, commonLengthLarger);   /* guaranteed minimum nb of common bytes */
         assert(matchIndex < current);
+        /* note : all candidates are now supposed sorted,
+         * but it's still possible to have nextPtr[1] == ZSTD_DUBT_UNSORTED_MARK
+         * when a real index has the same value as ZSTD_DUBT_UNSORTED_MARK */
 
         if ( (dictMode != ZSTD_extDict)
           || (matchIndex+matchLength >= dictLimit)  /* both in current segment*/
@@ -108,7 +112,7 @@
             match = dictBase + matchIndex;
             matchLength += ZSTD_count_2segments(ip+matchLength, match+matchLength, iend, dictEnd, prefixStart);
             if (matchIndex+matchLength >= dictLimit)
-                match = base + matchIndex;   /* to prepare for next usage of match[matchLength] */
+                match = base + matchIndex;   /* preparation for next read of match[matchLength] */
         }
 
         DEBUGLOG(8, "ZSTD_insertDUBT1: comparing %u with %u : found %u common bytes ",
@@ -147,6 +151,7 @@
         ZSTD_matchState_t* ms,
         const BYTE* const ip, const BYTE* const iend,
         size_t* offsetPtr,
+        size_t bestLength,
         U32 nbCompares,
         U32 const mls,
         const ZSTD_dictMode_e dictMode)
@@ -172,8 +177,7 @@
     U32         const btMask = (1 << btLog) - 1;
     U32         const btLow = (btMask >= dictHighLimit - dictLowLimit) ? dictLowLimit : dictHighLimit - btMask;
 
-    size_t commonLengthSmaller=0, commonLengthLarger=0, bestLength=0;
-    U32 matchEndIdx = current+8+1;
+    size_t commonLengthSmaller=0, commonLengthLarger=0;
 
     (void)dictMode;
     assert(dictMode == ZSTD_dictMatchState);
@@ -188,10 +192,8 @@
 
         if (matchLength > bestLength) {
             U32 matchIndex = dictMatchIndex + dictIndexDelta;
-            if (matchLength > matchEndIdx - matchIndex)
-                matchEndIdx = matchIndex + (U32)matchLength;
             if ( (4*(int)(matchLength-bestLength)) > (int)(ZSTD_highbit32(current-matchIndex+1) - ZSTD_highbit32((U32)offsetPtr[0]+1)) ) {
-                DEBUGLOG(2, "ZSTD_DUBT_findBestDictMatch(%u) : found better match length %u -> %u and offsetCode %u -> %u (dictMatchIndex %u, matchIndex %u)",
+                DEBUGLOG(9, "ZSTD_DUBT_findBetterDictMatch(%u) : found better match length %u -> %u and offsetCode %u -> %u (dictMatchIndex %u, matchIndex %u)",
                     current, (U32)bestLength, (U32)matchLength, (U32)*offsetPtr, ZSTD_REP_MOVE + current - matchIndex, dictMatchIndex, matchIndex);
                 bestLength = matchLength, *offsetPtr = ZSTD_REP_MOVE + current - matchIndex;
             }
@@ -200,7 +202,6 @@
             }
         }
 
-        DEBUGLOG(2, "matchLength:%6zu, match:%p, prefixStart:%p, ip:%p", matchLength, match, prefixStart, ip);
         if (match[matchLength] < ip[matchLength]) {
             if (dictMatchIndex <= btLow) { break; }   /* beyond tree size, stop the search */
             commonLengthSmaller = matchLength;    /* all smaller will now have at least this guaranteed common length */
@@ -215,7 +216,7 @@
 
     if (bestLength >= MINMATCH) {
         U32 const mIndex = current - ((U32)*offsetPtr - ZSTD_REP_MOVE); (void)mIndex;
-        DEBUGLOG(2, "ZSTD_DUBT_findBestDictMatch(%u) : found match of length %u and offsetCode %u (pos %u)",
+        DEBUGLOG(8, "ZSTD_DUBT_findBetterDictMatch(%u) : found match of length %u and offsetCode %u (pos %u)",
                     current, (U32)bestLength, (U32)*offsetPtr, mIndex);
     }
     return bestLength;
@@ -261,7 +262,7 @@
          && (nbCandidates > 1) ) {
         DEBUGLOG(8, "ZSTD_DUBT_findBestMatch: candidate %u is unsorted",
                     matchIndex);
-        *unsortedMark = previousCandidate;
+        *unsortedMark = previousCandidate;  /* the unsortedMark becomes a reversed chain, to move up back to original position */
         previousCandidate = matchIndex;
         matchIndex = *nextCandidate;
         nextCandidate = bt + 2*(matchIndex&btMask);
@@ -269,11 +270,13 @@
         nbCandidates --;
     }
 
+    /* nullify last candidate if it's still unsorted
+     * simplification, detrimental to compression ratio, beneficial for speed */
     if ( (matchIndex > unsortLimit)
       && (*unsortedMark==ZSTD_DUBT_UNSORTED_MARK) ) {
         DEBUGLOG(7, "ZSTD_DUBT_findBestMatch: nullify last unsorted candidate %u",
                     matchIndex);
-        *nextCandidate = *unsortedMark = 0;   /* nullify next candidate if it's still unsorted (note : simplification, detrimental to compression ratio, beneficial for speed) */
+        *nextCandidate = *unsortedMark = 0;
     }
 
     /* batch sort stacked candidates */
@@ -288,14 +291,14 @@
     }
 
     /* find longest match */
-    {   size_t commonLengthSmaller=0, commonLengthLarger=0;
+    {   size_t commonLengthSmaller = 0, commonLengthLarger = 0;
         const BYTE* const dictBase = ms->window.dictBase;
         const U32 dictLimit = ms->window.dictLimit;
         const BYTE* const dictEnd = dictBase + dictLimit;
         const BYTE* const prefixStart = base + dictLimit;
         U32* smallerPtr = bt + 2*(current&btMask);
         U32* largerPtr  = bt + 2*(current&btMask) + 1;
-        U32 matchEndIdx = current+8+1;
+        U32 matchEndIdx = current + 8 + 1;
         U32 dummy32;   /* to be nullified at the end */
         size_t bestLength = 0;
 
@@ -323,6 +326,11 @@
                 if ( (4*(int)(matchLength-bestLength)) > (int)(ZSTD_highbit32(current-matchIndex+1) - ZSTD_highbit32((U32)offsetPtr[0]+1)) )
                     bestLength = matchLength, *offsetPtr = ZSTD_REP_MOVE + current - matchIndex;
                 if (ip+matchLength == iend) {   /* equal : no way to know if inf or sup */
+                    if (dictMode == ZSTD_dictMatchState) {
+                        nbCompares = 0; /* in addition to avoiding checking any
+                                         * further in this loop, make sure we
+                                         * skip checking in the dictionary. */
+                    }
                     break;   /* drop, to guarantee consistency (miss a little bit of compression) */
                 }
             }
@@ -346,7 +354,10 @@
         *smallerPtr = *largerPtr = 0;
 
         if (dictMode == ZSTD_dictMatchState && nbCompares) {
-            bestLength = ZSTD_DUBT_findBetterDictMatch(ms, ip, iend, offsetPtr, nbCompares, mls, dictMode);
+            bestLength = ZSTD_DUBT_findBetterDictMatch(
+                    ms, ip, iend,
+                    offsetPtr, bestLength, nbCompares,
+                    mls, dictMode);
         }
 
         assert(matchEndIdx > current+8); /* ensure nextToUpdate is increased */
@@ -381,7 +392,7 @@
                             const BYTE* ip, const BYTE* const iLimit,
                                   size_t* offsetPtr)
 {
-    switch(ms->cParams.searchLength)
+    switch(ms->cParams.minMatch)
     {
     default : /* includes case 3 */
     case 4 : return ZSTD_BtFindBestMatch(ms, ip, iLimit, offsetPtr, 4, ZSTD_noDict);
@@ -397,7 +408,7 @@
                         const BYTE* ip, const BYTE* const iLimit,
                         size_t* offsetPtr)
 {
-    switch(ms->cParams.searchLength)
+    switch(ms->cParams.minMatch)
     {
     default : /* includes case 3 */
     case 4 : return ZSTD_BtFindBestMatch(ms, ip, iLimit, offsetPtr, 4, ZSTD_dictMatchState);
@@ -413,7 +424,7 @@
                         const BYTE* ip, const BYTE* const iLimit,
                         size_t* offsetPtr)
 {
-    switch(ms->cParams.searchLength)
+    switch(ms->cParams.minMatch)
     {
     default : /* includes case 3 */
     case 4 : return ZSTD_BtFindBestMatch(ms, ip, iLimit, offsetPtr, 4, ZSTD_extDict);
@@ -428,7 +439,7 @@
 /* *********************************
 *  Hash Chain
 ***********************************/
-#define NEXT_IN_CHAIN(d, mask)   chainTable[(d) & mask]
+#define NEXT_IN_CHAIN(d, mask)   chainTable[(d) & (mask)]
 
 /* Update chains up to ip (excluded)
    Assumption : always within prefix (i.e. not within extDict) */
@@ -458,7 +469,7 @@
 
 U32 ZSTD_insertAndFindFirstIndex(ZSTD_matchState_t* ms, const BYTE* ip) {
     const ZSTD_compressionParameters* const cParams = &ms->cParams;
-    return ZSTD_insertAndFindFirstIndex_internal(ms, cParams, ip, ms->cParams.searchLength);
+    return ZSTD_insertAndFindFirstIndex_internal(ms, cParams, ip, ms->cParams.minMatch);
 }
 
 
@@ -492,6 +503,7 @@
         size_t currentMl=0;
         if ((dictMode != ZSTD_extDict) || matchIndex >= dictLimit) {
             const BYTE* const match = base + matchIndex;
+            assert(matchIndex >= dictLimit);   /* ensures this is true if dictMode != ZSTD_extDict */
             if (match[ml] == ip[ml])   /* potentially better */
                 currentMl = ZSTD_count(ip, match, iLimit);
         } else {
@@ -554,7 +566,7 @@
                         const BYTE* ip, const BYTE* const iLimit,
                         size_t* offsetPtr)
 {
-    switch(ms->cParams.searchLength)
+    switch(ms->cParams.minMatch)
     {
     default : /* includes case 3 */
     case 4 : return ZSTD_HcFindBestMatch_generic(ms, ip, iLimit, offsetPtr, 4, ZSTD_noDict);
@@ -570,7 +582,7 @@
                         const BYTE* ip, const BYTE* const iLimit,
                         size_t* offsetPtr)
 {
-    switch(ms->cParams.searchLength)
+    switch(ms->cParams.minMatch)
     {
     default : /* includes case 3 */
     case 4 : return ZSTD_HcFindBestMatch_generic(ms, ip, iLimit, offsetPtr, 4, ZSTD_dictMatchState);
@@ -586,7 +598,7 @@
                         const BYTE* ip, const BYTE* const iLimit,
                         size_t* offsetPtr)
 {
-    switch(ms->cParams.searchLength)
+    switch(ms->cParams.minMatch)
     {
     default : /* includes case 3 */
     case 4 : return ZSTD_HcFindBestMatch_generic(ms, ip, iLimit, offsetPtr, 4, ZSTD_extDict);
diff --git a/contrib/python-zstandard/zstd/compress/zstd_ldm.h b/contrib/python-zstandard/zstd/compress/zstd_ldm.h
--- a/contrib/python-zstandard/zstd/compress/zstd_ldm.h
+++ b/contrib/python-zstandard/zstd/compress/zstd_ldm.h
@@ -21,7 +21,7 @@
 *  Long distance matching
 ***************************************/
 
-#define ZSTD_LDM_DEFAULT_WINDOW_LOG ZSTD_WINDOWLOG_DEFAULTMAX
+#define ZSTD_LDM_DEFAULT_WINDOW_LOG ZSTD_WINDOWLOG_LIMIT_DEFAULT
 
 /**
  * ZSTD_ldm_generateSequences():
@@ -86,12 +86,8 @@
  */
 size_t ZSTD_ldm_getMaxNbSeq(ldmParams_t params, size_t maxChunkSize);
 
-/** ZSTD_ldm_getTableSize() :
- *  Return prime8bytes^(minMatchLength-1) */
-U64 ZSTD_ldm_getHashPower(U32 minMatchLength);
-
 /** ZSTD_ldm_adjustParameters() :
- *  If the params->hashEveryLog is not set, set it to its default value based on
+ *  If the params->hashRateLog is not set, set it to its default value based on
  *  windowLog and params->hashLog.
  *
  *  Ensures that params->bucketSizeLog is <= params->hashLog (setting it to
diff --git a/contrib/python-zstandard/zstd/compress/zstd_ldm.c b/contrib/python-zstandard/zstd/compress/zstd_ldm.c
--- a/contrib/python-zstandard/zstd/compress/zstd_ldm.c
+++ b/contrib/python-zstandard/zstd/compress/zstd_ldm.c
@@ -37,8 +37,8 @@
         params->hashLog = MAX(ZSTD_HASHLOG_MIN, params->windowLog - LDM_HASH_RLOG);
         assert(params->hashLog <= ZSTD_HASHLOG_MAX);
     }
-    if (params->hashEveryLog == 0) {
-        params->hashEveryLog = params->windowLog < params->hashLog
+    if (params->hashRateLog == 0) {
+        params->hashRateLog = params->windowLog < params->hashLog
                                    ? 0
                                    : params->windowLog - params->hashLog;
     }
@@ -119,20 +119,20 @@
  *
  *  Gets the small hash, checksum, and tag from the rollingHash.
  *
- *  If the tag matches (1 << ldmParams.hashEveryLog)-1, then
+ *  If the tag matches (1 << ldmParams.hashRateLog)-1, then
  *  creates an ldmEntry from the offset, and inserts it into the hash table.
  *
  *  hBits is the length of the small hash, which is the most significant hBits
  *  of rollingHash. The checksum is the next 32 most significant bits, followed
- *  by ldmParams.hashEveryLog bits that make up the tag. */
+ *  by ldmParams.hashRateLog bits that make up the tag. */
 static void ZSTD_ldm_makeEntryAndInsertByTag(ldmState_t* ldmState,
                                              U64 const rollingHash,
                                              U32 const hBits,
                                              U32 const offset,
                                              ldmParams_t const ldmParams)
 {
-    U32 const tag = ZSTD_ldm_getTag(rollingHash, hBits, ldmParams.hashEveryLog);
-    U32 const tagMask = ((U32)1 << ldmParams.hashEveryLog) - 1;
+    U32 const tag = ZSTD_ldm_getTag(rollingHash, hBits, ldmParams.hashRateLog);
+    U32 const tagMask = ((U32)1 << ldmParams.hashRateLog) - 1;
     if (tag == tagMask) {
         U32 const hash = ZSTD_ldm_getSmallHash(rollingHash, hBits);
         U32 const checksum = ZSTD_ldm_getChecksum(rollingHash, hBits);
@@ -143,56 +143,6 @@
     }
 }
 
-/** ZSTD_ldm_getRollingHash() :
- *  Get a 64-bit hash using the first len bytes from buf.
- *
- *  Giving bytes s = s_1, s_2, ... s_k, the hash is defined to be
- *  H(s) = s_1*(a^(k-1)) + s_2*(a^(k-2)) + ... + s_k*(a^0)
- *
- *  where the constant a is defined to be prime8bytes.
- *
- *  The implementation adds an offset to each byte, so
- *  H(s) = (s_1 + HASH_CHAR_OFFSET)*(a^(k-1)) + ... */
-static U64 ZSTD_ldm_getRollingHash(const BYTE* buf, U32 len)
-{
-    U64 ret = 0;
-    U32 i;
-    for (i = 0; i < len; i++) {
-        ret *= prime8bytes;
-        ret += buf[i] + LDM_HASH_CHAR_OFFSET;
-    }
-    return ret;
-}
-
-/** ZSTD_ldm_ipow() :
- *  Return base^exp. */
-static U64 ZSTD_ldm_ipow(U64 base, U64 exp)
-{
-    U64 ret = 1;
-    while (exp) {
-        if (exp & 1) { ret *= base; }
-        exp >>= 1;
-        base *= base;
-    }
-    return ret;
-}
-
-U64 ZSTD_ldm_getHashPower(U32 minMatchLength) {
-    DEBUGLOG(4, "ZSTD_ldm_getHashPower: mml=%u", minMatchLength);
-    assert(minMatchLength >= ZSTD_LDM_MINMATCH_MIN);
-    return ZSTD_ldm_ipow(prime8bytes, minMatchLength - 1);
-}
-
-/** ZSTD_ldm_updateHash() :
- *  Updates hash by removing toRemove and adding toAdd. */
-static U64 ZSTD_ldm_updateHash(U64 hash, BYTE toRemove, BYTE toAdd, U64 hashPower)
-{
-    hash -= ((toRemove + LDM_HASH_CHAR_OFFSET) * hashPower);
-    hash *= prime8bytes;
-    hash += toAdd + LDM_HASH_CHAR_OFFSET;
-    return hash;
-}
-
 /** ZSTD_ldm_countBackwardsMatch() :
  *  Returns the number of bytes that match backwards before pIn and pMatch.
  *
@@ -238,6 +188,7 @@
     case ZSTD_btlazy2:
     case ZSTD_btopt:
     case ZSTD_btultra:
+    case ZSTD_btultra2:
         break;
     default:
         assert(0);  /* not possible : not a valid strategy id */
@@ -261,9 +212,9 @@
     const BYTE* cur = lastHashed + 1;
 
     while (cur < iend) {
-        rollingHash = ZSTD_ldm_updateHash(rollingHash, cur[-1],
-                                          cur[ldmParams.minMatchLength-1],
-                                          state->hashPower);
+        rollingHash = ZSTD_rollingHash_rotate(rollingHash, cur[-1],
+                                              cur[ldmParams.minMatchLength-1],
+                                              state->hashPower);
         ZSTD_ldm_makeEntryAndInsertByTag(state,
                                          rollingHash, hBits,
                                          (U32)(cur - base), ldmParams);
@@ -297,8 +248,8 @@
     U64 const hashPower = ldmState->hashPower;
     U32 const hBits = params->hashLog - params->bucketSizeLog;
     U32 const ldmBucketSize = 1U << params->bucketSizeLog;
-    U32 const hashEveryLog = params->hashEveryLog;
-    U32 const ldmTagMask = (1U << params->hashEveryLog) - 1;
+    U32 const hashRateLog = params->hashRateLog;
+    U32 const ldmTagMask = (1U << params->hashRateLog) - 1;
     /* Prefix and extDict parameters */
     U32 const dictLimit = ldmState->window.dictLimit;
     U32 const lowestIndex = extDict ? ldmState->window.lowLimit : dictLimit;
@@ -324,16 +275,16 @@
         size_t forwardMatchLength = 0, backwardMatchLength = 0;
         ldmEntry_t* bestEntry = NULL;
         if (ip != istart) {
-            rollingHash = ZSTD_ldm_updateHash(rollingHash, lastHashed[0],
-                                              lastHashed[minMatchLength],
-                                              hashPower);
+            rollingHash = ZSTD_rollingHash_rotate(rollingHash, lastHashed[0],
+                                                  lastHashed[minMatchLength],
+                                                  hashPower);
         } else {
-            rollingHash = ZSTD_ldm_getRollingHash(ip, minMatchLength);
+            rollingHash = ZSTD_rollingHash_compute(ip, minMatchLength);
         }
         lastHashed = ip;
 
         /* Do not insert and do not look for a match */
-        if (ZSTD_ldm_getTag(rollingHash, hBits, hashEveryLog) != ldmTagMask) {
+        if (ZSTD_ldm_getTag(rollingHash, hBits, hashRateLog) != ldmTagMask) {
            ip++;
            continue;
         }
@@ -593,7 +544,7 @@
     void const* src, size_t srcSize)
 {
     const ZSTD_compressionParameters* const cParams = &ms->cParams;
-    unsigned const minMatch = cParams->searchLength;
+    unsigned const minMatch = cParams->minMatch;
     ZSTD_blockCompressor const blockCompressor =
         ZSTD_selectBlockCompressor(cParams->strategy, ZSTD_matchState_dictMode(ms));
     /* Input bounds */
diff --git a/contrib/python-zstandard/zstd/compress/zstd_opt.h b/contrib/python-zstandard/zstd/compress/zstd_opt.h
--- a/contrib/python-zstandard/zstd/compress/zstd_opt.h
+++ b/contrib/python-zstandard/zstd/compress/zstd_opt.h
@@ -26,6 +26,10 @@
 size_t ZSTD_compressBlock_btultra(
         ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
+size_t ZSTD_compressBlock_btultra2(
+        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        void const* src, size_t srcSize);
+
 
 size_t ZSTD_compressBlock_btopt_dictMatchState(
         ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
@@ -41,6 +45,10 @@
         ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
 
+        /* note : no btultra2 variant for extDict nor dictMatchState,
+         * because btultra2 is not meant to work with dictionaries
+         * and is only specific for the first block (no prefix) */
+
 #if defined (__cplusplus)
 }
 #endif
diff --git a/contrib/python-zstandard/zstd/compress/zstd_opt.c b/contrib/python-zstandard/zstd/compress/zstd_opt.c
--- a/contrib/python-zstandard/zstd/compress/zstd_opt.c
+++ b/contrib/python-zstandard/zstd/compress/zstd_opt.c
@@ -17,6 +17,8 @@
 #define ZSTD_FREQ_DIV       4   /* log factor when using previous stats to init next stats */
 #define ZSTD_MAX_PRICE     (1<<30)
 
+#define ZSTD_PREDEF_THRESHOLD 1024   /* if srcSize < ZSTD_PREDEF_THRESHOLD, symbols' cost is assumed static, directly determined by pre-defined distributions */
+
 
 /*-*************************************
 *  Price functions for optimal parser
@@ -52,11 +54,15 @@
     return weight;
 }
 
-/* debugging function, @return price in bytes */
+#if (DEBUGLEVEL>=2)
+/* debugging function,
+ * @return price in bytes as fractional value
+ * for debug messages only */
 MEM_STATIC double ZSTD_fCost(U32 price)
 {
     return (double)price / (BITCOST_MULTIPLIER*8);
 }
+#endif
 
 static void ZSTD_setBasePrices(optState_t* optPtr, int optLevel)
 {
@@ -67,29 +73,44 @@
 }
 
 
-static U32 ZSTD_downscaleStat(U32* table, U32 lastEltIndex, int malus)
+/* ZSTD_downscaleStat() :
+ * reduce all elements in table by a factor 2^(ZSTD_FREQ_DIV+malus)
+ * return the resulting sum of elements */
+static U32 ZSTD_downscaleStat(unsigned* table, U32 lastEltIndex, int malus)
 {
     U32 s, sum=0;
+    DEBUGLOG(5, "ZSTD_downscaleStat (nbElts=%u)", (unsigned)lastEltIndex+1);
     assert(ZSTD_FREQ_DIV+malus > 0 && ZSTD_FREQ_DIV+malus < 31);
-    for (s=0; s<=lastEltIndex; s++) {
+    for (s=0; s<lastEltIndex+1; s++) {
         table[s] = 1 + (table[s] >> (ZSTD_FREQ_DIV+malus));
         sum += table[s];
     }
     return sum;
 }
 
-static void ZSTD_rescaleFreqs(optState_t* const optPtr,
-                              const BYTE* const src, size_t const srcSize,
-                              int optLevel)
+/* ZSTD_rescaleFreqs() :
+ * if first block (detected by optPtr->litLengthSum == 0) : init statistics
+ *    take hints from dictionary if there is one
+ *    or init from zero, using src for literals stats, or flat 1 for match symbols
+ * otherwise downscale existing stats, to be used as seed for next block.
+ */
+static void
+ZSTD_rescaleFreqs(optState_t* const optPtr,
+            const BYTE* const src, size_t const srcSize,
+                  int const optLevel)
 {
+    DEBUGLOG(5, "ZSTD_rescaleFreqs (srcSize=%u)", (unsigned)srcSize);
     optPtr->priceType = zop_dynamic;
 
     if (optPtr->litLengthSum == 0) {  /* first block : init */
-        if (srcSize <= 1024)   /* heuristic */
+        if (srcSize <= ZSTD_PREDEF_THRESHOLD) {  /* heuristic */
+            DEBUGLOG(5, "(srcSize <= ZSTD_PREDEF_THRESHOLD) => zop_predef");
             optPtr->priceType = zop_predef;
+        }
 
         assert(optPtr->symbolCosts != NULL);
-        if (optPtr->symbolCosts->huf.repeatMode == HUF_repeat_valid) { /* huffman table presumed generated by dictionary */
+        if (optPtr->symbolCosts->huf.repeatMode == HUF_repeat_valid) {
+            /* huffman table presumed generated by dictionary */
             optPtr->priceType = zop_dynamic;
 
             assert(optPtr->litFreq != NULL);
@@ -208,7 +229,9 @@
 
     /* dynamic statistics */
     {   U32 const llCode = ZSTD_LLcode(litLength);
-        return (LL_bits[llCode] * BITCOST_MULTIPLIER) + (optPtr->litLengthSumBasePrice - WEIGHT(optPtr->litLengthFreq[llCode], optLevel));
+        return (LL_bits[llCode] * BITCOST_MULTIPLIER)
+             + optPtr->litLengthSumBasePrice
+             - WEIGHT(optPtr->litLengthFreq[llCode], optLevel);
     }
 }
 
@@ -253,7 +276,7 @@
 FORCE_INLINE_TEMPLATE U32
 ZSTD_getMatchPrice(U32 const offset,
                    U32 const matchLength,
-                   const optState_t* const optPtr,
+             const optState_t* const optPtr,
                    int const optLevel)
 {
     U32 price;
@@ -385,7 +408,6 @@
     U32* largerPtr  = smallerPtr + 1;
     U32 dummy32;   /* to be nullified at the end */
     U32 const windowLow = ms->window.lowLimit;
-    U32 const matchLow = windowLow ? windowLow : 1;
     U32 matchEndIdx = current+8+1;
     size_t bestLength = 8;
     U32 nbCompares = 1U << cParams->searchLog;
@@ -401,7 +423,8 @@
     assert(ip <= iend-8);   /* required for h calculation */
     hashTable[h] = current;   /* Update Hash Table */
 
-    while (nbCompares-- && (matchIndex >= matchLow)) {
+    assert(windowLow > 0);
+    while (nbCompares-- && (matchIndex >= windowLow)) {
         U32* const nextPtr = bt + 2*(matchIndex & btMask);
         size_t matchLength = MIN(commonLengthSmaller, commonLengthLarger);   /* guaranteed minimum nb of common bytes */
         assert(matchIndex < current);
@@ -479,7 +502,7 @@
     const BYTE* const base = ms->window.base;
     U32 const target = (U32)(ip - base);
     U32 idx = ms->nextToUpdate;
-    DEBUGLOG(5, "ZSTD_updateTree_internal, from %u to %u  (dictMode:%u)",
+    DEBUGLOG(6, "ZSTD_updateTree_internal, from %u to %u  (dictMode:%u)",
                 idx, target, dictMode);
 
     while(idx < target)
@@ -488,15 +511,18 @@
 }
 
 void ZSTD_updateTree(ZSTD_matchState_t* ms, const BYTE* ip, const BYTE* iend) {
-    ZSTD_updateTree_internal(ms, ip, iend, ms->cParams.searchLength, ZSTD_noDict);
+    ZSTD_updateTree_internal(ms, ip, iend, ms->cParams.minMatch, ZSTD_noDict);
 }
 
 FORCE_INLINE_TEMPLATE
 U32 ZSTD_insertBtAndGetAllMatches (
                     ZSTD_matchState_t* ms,
                     const BYTE* const ip, const BYTE* const iLimit, const ZSTD_dictMode_e dictMode,
-                    U32 rep[ZSTD_REP_NUM], U32 const ll0,
-                    ZSTD_match_t* matches, const U32 lengthToBeat, U32 const mls /* template */)
+                    U32 rep[ZSTD_REP_NUM],
+                    U32 const ll0,   /* tells if associated literal length is 0 or not. This value must be 0 or 1 */
+                    ZSTD_match_t* matches,
+                    const U32 lengthToBeat,
+                    U32 const mls /* template */)
 {
     const ZSTD_compressionParameters* const cParams = &ms->cParams;
     U32 const sufficient_len = MIN(cParams->targetLength, ZSTD_OPT_NUM -1);
@@ -542,6 +568,7 @@
     DEBUGLOG(8, "ZSTD_insertBtAndGetAllMatches: current=%u", current);
 
     /* check repCode */
+    assert(ll0 <= 1);   /* necessarily 1 or 0 */
     {   U32 const lastR = ZSTD_REP_NUM + ll0;
         U32 repCode;
         for (repCode = ll0; repCode < lastR; repCode++) {
@@ -724,7 +751,7 @@
                         ZSTD_match_t* matches, U32 const lengthToBeat)
 {
     const ZSTD_compressionParameters* const cParams = &ms->cParams;
-    U32 const matchLengthSearch = cParams->searchLength;
+    U32 const matchLengthSearch = cParams->minMatch;
     DEBUGLOG(8, "ZSTD_BtGetAllMatches");
     if (ip < ms->window.base + ms->nextToUpdate) return 0;   /* skipped area */
     ZSTD_updateTree_internal(ms, ip, iHighLimit, matchLengthSearch, dictMode);
@@ -774,12 +801,30 @@
     return sol.litlen + sol.mlen;
 }
 
+#if 0 /* debug */
+
+static void
+listStats(const U32* table, int lastEltID)
+{
+    int const nbElts = lastEltID + 1;
+    int enb;
+    for (enb=0; enb < nbElts; enb++) {
+        (void)table;
+        //RAWLOG(2, "%3i:%3i,  ", enb, table[enb]);
+        RAWLOG(2, "%4i,", table[enb]);
+    }
+    RAWLOG(2, " \n");
+}
+
+#endif
+
 FORCE_INLINE_TEMPLATE size_t
 ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* ms,
                                seqStore_t* seqStore,
                                U32 rep[ZSTD_REP_NUM],
-                               const void* src, size_t srcSize,
-                               const int optLevel, const ZSTD_dictMode_e dictMode)
+                         const void* src, size_t srcSize,
+                         const int optLevel,
+                         const ZSTD_dictMode_e dictMode)
 {
     optState_t* const optStatePtr = &ms->opt;
     const BYTE* const istart = (const BYTE*)src;
@@ -792,14 +837,15 @@
     const ZSTD_compressionParameters* const cParams = &ms->cParams;
 
     U32 const sufficient_len = MIN(cParams->targetLength, ZSTD_OPT_NUM -1);
-    U32 const minMatch = (cParams->searchLength == 3) ? 3 : 4;
+    U32 const minMatch = (cParams->minMatch == 3) ? 3 : 4;
 
     ZSTD_optimal_t* const opt = optStatePtr->priceTable;
     ZSTD_match_t* const matches = optStatePtr->matchTable;
     ZSTD_optimal_t lastSequence;
 
     /* init */
-    DEBUGLOG(5, "ZSTD_compressBlock_opt_generic");
+    DEBUGLOG(5, "ZSTD_compressBlock_opt_generic: current=%u, prefix=%u, nextToUpdate=%u",
+                (U32)(ip - base), ms->window.dictLimit, ms->nextToUpdate);
     assert(optLevel <= 2);
     ms->nextToUpdate3 = ms->nextToUpdate;
     ZSTD_rescaleFreqs(optStatePtr, (const BYTE*)src, srcSize, optLevel);
@@ -999,7 +1045,7 @@
                     U32 const offCode = opt[storePos].off;
                     U32 const advance = llen + mlen;
                     DEBUGLOG(6, "considering seq starting at %zi, llen=%u, mlen=%u",
-                                anchor - istart, llen, mlen);
+                                anchor - istart, (unsigned)llen, (unsigned)mlen);
 
                     if (mlen==0) {  /* only literals => must be last "sequence", actually starting a new stream of sequences */
                         assert(storePos == storeEnd);   /* must be last sequence */
@@ -1047,11 +1093,11 @@
 
 
 /* used in 2-pass strategy */
-static U32 ZSTD_upscaleStat(U32* table, U32 lastEltIndex, int bonus)
+static U32 ZSTD_upscaleStat(unsigned* table, U32 lastEltIndex, int bonus)
 {
     U32 s, sum=0;
-    assert(ZSTD_FREQ_DIV+bonus > 0);
-    for (s=0; s<=lastEltIndex; s++) {
+    assert(ZSTD_FREQ_DIV+bonus >= 0);
+    for (s=0; s<lastEltIndex+1; s++) {
         table[s] <<= ZSTD_FREQ_DIV+bonus;
         table[s]--;
         sum += table[s];
@@ -1063,9 +1109,43 @@
 MEM_STATIC void ZSTD_upscaleStats(optState_t* optPtr)
 {
     optPtr->litSum = ZSTD_upscaleStat(optPtr->litFreq, MaxLit, 0);
-    optPtr->litLengthSum = ZSTD_upscaleStat(optPtr->litLengthFreq, MaxLL, 1);
-    optPtr->matchLengthSum = ZSTD_upscaleStat(optPtr->matchLengthFreq, MaxML, 1);
-    optPtr->offCodeSum = ZSTD_upscaleStat(optPtr->offCodeFreq, MaxOff, 1);
+    optPtr->litLengthSum = ZSTD_upscaleStat(optPtr->litLengthFreq, MaxLL, 0);
+    optPtr->matchLengthSum = ZSTD_upscaleStat(optPtr->matchLengthFreq, MaxML, 0);
+    optPtr->offCodeSum = ZSTD_upscaleStat(optPtr->offCodeFreq, MaxOff, 0);
+}
+
+/* ZSTD_initStats_ultra():
+ * make a first compression pass, just to seed stats with more accurate starting values.
+ * only works on first block, with no dictionary and no ldm.
+ * this function cannot error, hence its constract must be respected.
+ */
+static void
+ZSTD_initStats_ultra(ZSTD_matchState_t* ms,
+                     seqStore_t* seqStore,
+                     U32 rep[ZSTD_REP_NUM],
+               const void* src, size_t srcSize)
+{
+    U32 tmpRep[ZSTD_REP_NUM];  /* updated rep codes will sink here */
+    memcpy(tmpRep, rep, sizeof(tmpRep));
+
+    DEBUGLOG(4, "ZSTD_initStats_ultra (srcSize=%zu)", srcSize);
+    assert(ms->opt.litLengthSum == 0);    /* first block */
+    assert(seqStore->sequences == seqStore->sequencesStart);   /* no ldm */
+    assert(ms->window.dictLimit == ms->window.lowLimit);   /* no dictionary */
+    assert(ms->window.dictLimit - ms->nextToUpdate <= 1);  /* no prefix (note: intentional overflow, defined as 2-complement) */
+
+    ZSTD_compressBlock_opt_generic(ms, seqStore, tmpRep, src, srcSize, 2 /*optLevel*/, ZSTD_noDict);   /* generate stats into ms->opt*/
+
+    /* invalidate first scan from history */
+    ZSTD_resetSeqStore(seqStore);
+    ms->window.base -= srcSize;
+    ms->window.dictLimit += (U32)srcSize;
+    ms->window.lowLimit = ms->window.dictLimit;
+    ms->nextToUpdate = ms->window.dictLimit;
+    ms->nextToUpdate3 = ms->window.dictLimit;
+
+    /* re-inforce weight of collected statistics */
+    ZSTD_upscaleStats(&ms->opt);
 }
 
 size_t ZSTD_compressBlock_btultra(
@@ -1073,33 +1153,34 @@
         const void* src, size_t srcSize)
 {
     DEBUGLOG(5, "ZSTD_compressBlock_btultra (srcSize=%zu)", srcSize);
-#if 0
-    /* 2-pass strategy (disabled)
+    return ZSTD_compressBlock_opt_generic(ms, seqStore, rep, src, srcSize, 2 /*optLevel*/, ZSTD_noDict);
+}
+
+size_t ZSTD_compressBlock_btultra2(
+        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        const void* src, size_t srcSize)
+{
+    U32 const current = (U32)((const BYTE*)src - ms->window.base);
+    DEBUGLOG(5, "ZSTD_compressBlock_btultra2 (srcSize=%zu)", srcSize);
+
+    /* 2-pass strategy:
      * this strategy makes a first pass over first block to collect statistics
      * and seed next round's statistics with it.
+     * After 1st pass, function forgets everything, and starts a new block.
+     * Consequently, this can only work if no data has been previously loaded in tables,
+     * aka, no dictionary, no prefix, no ldm preprocessing.
      * The compression ratio gain is generally small (~0.5% on first block),
      * the cost is 2x cpu time on first block. */
     assert(srcSize <= ZSTD_BLOCKSIZE_MAX);
     if ( (ms->opt.litLengthSum==0)   /* first block */
-      && (seqStore->sequences == seqStore->sequencesStart)   /* no ldm */
-      && (ms->window.dictLimit == ms->window.lowLimit) ) {   /* no dictionary */
-        U32 tmpRep[ZSTD_REP_NUM];
-        DEBUGLOG(5, "ZSTD_compressBlock_btultra: first block: collecting statistics");
-        assert(ms->nextToUpdate >= ms->window.dictLimit
-            && ms->nextToUpdate <= ms->window.dictLimit + 1);
-        memcpy(tmpRep, rep, sizeof(tmpRep));
-        ZSTD_compressBlock_opt_generic(ms, seqStore, tmpRep, src, srcSize, 2 /*optLevel*/, ZSTD_noDict);   /* generate stats into ms->opt*/
-        ZSTD_resetSeqStore(seqStore);
-        /* invalidate first scan from history */
-        ms->window.base -= srcSize;
-        ms->window.dictLimit += (U32)srcSize;
-        ms->window.lowLimit = ms->window.dictLimit;
-        ms->nextToUpdate = ms->window.dictLimit;
-        ms->nextToUpdate3 = ms->window.dictLimit;
-        /* re-inforce weight of collected statistics */
-        ZSTD_upscaleStats(&ms->opt);
+      && (seqStore->sequences == seqStore->sequencesStart)  /* no ldm */
+      && (ms->window.dictLimit == ms->window.lowLimit)   /* no dictionary */
+      && (current == ms->window.dictLimit)   /* start of frame, nothing already loaded nor skipped */
+      && (srcSize > ZSTD_PREDEF_THRESHOLD)
+      ) {
+        ZSTD_initStats_ultra(ms, seqStore, rep, src, srcSize);
     }
-#endif
+
     return ZSTD_compressBlock_opt_generic(ms, seqStore, rep, src, srcSize, 2 /*optLevel*/, ZSTD_noDict);
 }
 
@@ -1130,3 +1211,7 @@
 {
     return ZSTD_compressBlock_opt_generic(ms, seqStore, rep, src, srcSize, 2 /*optLevel*/, ZSTD_extDict);
 }
+
+/* note : no btultra2 variant for extDict nor dictMatchState,
+ * because btultra2 is not meant to work with dictionaries
+ * and is only specific for the first block (no prefix) */
diff --git a/contrib/python-zstandard/zstd/compress/zstdmt_compress.h b/contrib/python-zstandard/zstd/compress/zstdmt_compress.h
--- a/contrib/python-zstandard/zstd/compress/zstdmt_compress.h
+++ b/contrib/python-zstandard/zstd/compress/zstdmt_compress.h
@@ -28,6 +28,16 @@
 #include "zstd.h"            /* ZSTD_inBuffer, ZSTD_outBuffer, ZSTDLIB_API */
 
 
+/* ===   Constants   === */
+#ifndef ZSTDMT_NBWORKERS_MAX
+#  define ZSTDMT_NBWORKERS_MAX 200
+#endif
+#ifndef ZSTDMT_JOBSIZE_MIN
+#  define ZSTDMT_JOBSIZE_MIN (1 MB)
+#endif
+#define ZSTDMT_JOBSIZE_MAX  (MEM_32bits() ? (512 MB) : (1024 MB))
+
+
 /* ===   Memory management   === */
 typedef struct ZSTDMT_CCtx_s ZSTDMT_CCtx;
 ZSTDLIB_API ZSTDMT_CCtx* ZSTDMT_createCCtx(unsigned nbWorkers);
@@ -52,6 +62,7 @@
 ZSTDLIB_API size_t ZSTDMT_initCStream(ZSTDMT_CCtx* mtctx, int compressionLevel);
 ZSTDLIB_API size_t ZSTDMT_resetCStream(ZSTDMT_CCtx* mtctx, unsigned long long pledgedSrcSize);  /**< if srcSize is not known at reset time, use ZSTD_CONTENTSIZE_UNKNOWN. Note: for compatibility with older programs, 0 means the same as ZSTD_CONTENTSIZE_UNKNOWN, but it will change in the future to mean "empty" */
 
+ZSTDLIB_API size_t ZSTDMT_nextInputSizeHint(const ZSTDMT_CCtx* mtctx);
 ZSTDLIB_API size_t ZSTDMT_compressStream(ZSTDMT_CCtx* mtctx, ZSTD_outBuffer* output, ZSTD_inBuffer* input);
 
 ZSTDLIB_API size_t ZSTDMT_flushStream(ZSTDMT_CCtx* mtctx, ZSTD_outBuffer* output);   /**< @return : 0 == all flushed; >0 : still some data to be flushed; or an error code (ZSTD_isError()) */
@@ -60,16 +71,12 @@
 
 /* ===   Advanced functions and parameters  === */
 
-#ifndef ZSTDMT_JOBSIZE_MIN
-#  define ZSTDMT_JOBSIZE_MIN (1U << 20)   /* 1 MB - Minimum size of each compression job */
-#endif
-
 ZSTDLIB_API size_t ZSTDMT_compress_advanced(ZSTDMT_CCtx* mtctx,
                                            void* dst, size_t dstCapacity,
                                      const void* src, size_t srcSize,
                                      const ZSTD_CDict* cdict,
                                            ZSTD_parameters params,
-                                           unsigned overlapLog);
+                                           int overlapLog);
 
 ZSTDLIB_API size_t ZSTDMT_initCStream_advanced(ZSTDMT_CCtx* mtctx,
                                         const void* dict, size_t dictSize,   /* dict can be released after init, a local copy is preserved within zcs */
@@ -84,8 +91,9 @@
 /* ZSTDMT_parameter :
  * List of parameters that can be set using ZSTDMT_setMTCtxParameter() */
 typedef enum {
-    ZSTDMT_p_jobSize,           /* Each job is compressed in parallel. By default, this value is dynamically determined depending on compression parameters. Can be set explicitly here. */
-    ZSTDMT_p_overlapSectionLog  /* Each job may reload a part of previous job to enhance compressionr ratio; 0 == no overlap, 6(default) == use 1/8th of window, >=9 == use full window. This is a "sticky" parameter : its value will be re-used on next compression job */
+    ZSTDMT_p_jobSize,     /* Each job is compressed in parallel. By default, this value is dynamically determined depending on compression parameters. Can be set explicitly here. */
+    ZSTDMT_p_overlapLog,  /* Each job may reload a part of previous job to enhance compressionr ratio; 0 == no overlap, 6(default) == use 1/8th of window, >=9 == use full window. This is a "sticky" parameter : its value will be re-used on next compression job */
+    ZSTDMT_p_rsyncable    /* Enables rsyncable mode. */
 } ZSTDMT_parameter;
 
 /* ZSTDMT_setMTCtxParameter() :
@@ -93,12 +101,12 @@
  * The function must be called typically after ZSTD_createCCtx() but __before ZSTDMT_init*() !__
  * Parameters not explicitly reset by ZSTDMT_init*() remain the same in consecutive compression sessions.
  * @return : 0, or an error code (which can be tested using ZSTD_isError()) */
-ZSTDLIB_API size_t ZSTDMT_setMTCtxParameter(ZSTDMT_CCtx* mtctx, ZSTDMT_parameter parameter, unsigned value);
+ZSTDLIB_API size_t ZSTDMT_setMTCtxParameter(ZSTDMT_CCtx* mtctx, ZSTDMT_parameter parameter, int value);
 
 /* ZSTDMT_getMTCtxParameter() :
  * Query the ZSTDMT_CCtx for a parameter value.
  * @return : 0, or an error code (which can be tested using ZSTD_isError()) */
-ZSTDLIB_API size_t ZSTDMT_getMTCtxParameter(ZSTDMT_CCtx* mtctx, ZSTDMT_parameter parameter, unsigned* value);
+ZSTDLIB_API size_t ZSTDMT_getMTCtxParameter(ZSTDMT_CCtx* mtctx, ZSTDMT_parameter parameter, int* value);
 
 
 /*! ZSTDMT_compressStream_generic() :
@@ -129,7 +137,7 @@
 
 /*! ZSTDMT_CCtxParam_setMTCtxParameter()
  *  like ZSTDMT_setMTCtxParameter(), but into a ZSTD_CCtx_Params */
-size_t ZSTDMT_CCtxParam_setMTCtxParameter(ZSTD_CCtx_params* params, ZSTDMT_parameter parameter, unsigned value);
+size_t ZSTDMT_CCtxParam_setMTCtxParameter(ZSTD_CCtx_params* params, ZSTDMT_parameter parameter, int value);
 
 /*! ZSTDMT_CCtxParam_setNbWorkers()
  *  Set nbWorkers, and clamp it.
diff --git a/contrib/python-zstandard/zstd/compress/zstdmt_compress.c b/contrib/python-zstandard/zstd/compress/zstdmt_compress.c
--- a/contrib/python-zstandard/zstd/compress/zstdmt_compress.c
+++ b/contrib/python-zstandard/zstd/compress/zstdmt_compress.c
@@ -9,21 +9,19 @@
  */
 
 
-/* ======   Tuning parameters   ====== */
-#define ZSTDMT_NBWORKERS_MAX 200
-#define ZSTDMT_JOBSIZE_MAX  (MEM_32bits() ? (512 MB) : (2 GB))  /* note : limited by `jobSize` type, which is `unsigned` */
-#define ZSTDMT_OVERLAPLOG_DEFAULT 6
-
-
 /* ======   Compiler specifics   ====== */
 #if defined(_MSC_VER)
 #  pragma warning(disable : 4204)   /* disable: C4204: non-constant aggregate initializer */
 #endif
 
 
+/* ======   Constants   ====== */
+#define ZSTDMT_OVERLAPLOG_DEFAULT 0
+
+
 /* ======   Dependencies   ====== */
 #include <string.h>      /* memcpy, memset */
-#include <limits.h>      /* INT_MAX */
+#include <limits.h>      /* INT_MAX, UINT_MAX */
 #include "pool.h"        /* threadpool */
 #include "threading.h"   /* mutex */
 #include "zstd_compress_internal.h"  /* MIN, ERROR, ZSTD_*, ZSTD_highbit32 */
@@ -57,9 +55,9 @@
    static clock_t _ticksPerSecond = 0;
    if (_ticksPerSecond <= 0) _ticksPerSecond = sysconf(_SC_CLK_TCK);
 
-   { struct tms junk; clock_t newTicks = (clock_t) times(&junk);
-     return ((((unsigned long long)newTicks)*(1000000))/_ticksPerSecond); }
-}
+   {   struct tms junk; clock_t newTicks = (clock_t) times(&junk);
+       return ((((unsigned long long)newTicks)*(1000000))/_ticksPerSecond);
+}  }
 
 #define MUTEX_WAIT_TIME_DLEVEL 6
 #define ZSTD_PTHREAD_MUTEX_LOCK(mutex) {          \
@@ -342,8 +340,8 @@
 
 typedef struct {
     ZSTD_pthread_mutex_t poolMutex;
-    unsigned totalCCtx;
-    unsigned availCCtx;
+    int totalCCtx;
+    int availCCtx;
     ZSTD_customMem cMem;
     ZSTD_CCtx* cctx[1];   /* variable size */
 } ZSTDMT_CCtxPool;
@@ -351,16 +349,16 @@
 /* note : all CCtx borrowed from the pool should be released back to the pool _before_ freeing the pool */
 static void ZSTDMT_freeCCtxPool(ZSTDMT_CCtxPool* pool)
 {
-    unsigned u;
-    for (u=0; u<pool->totalCCtx; u++)
-        ZSTD_freeCCtx(pool->cctx[u]);  /* note : compatible with free on NULL */
+    int cid;
+    for (cid=0; cid<pool->totalCCtx; cid++)
+        ZSTD_freeCCtx(pool->cctx[cid]);  /* note : compatible with free on NULL */
     ZSTD_pthread_mutex_destroy(&pool->poolMutex);
     ZSTD_free(pool, pool->cMem);
 }
 
 /* ZSTDMT_createCCtxPool() :
  * implies nbWorkers >= 1 , checked by caller ZSTDMT_createCCtx() */
-static ZSTDMT_CCtxPool* ZSTDMT_createCCtxPool(unsigned nbWorkers,
+static ZSTDMT_CCtxPool* ZSTDMT_createCCtxPool(int nbWorkers,
                                               ZSTD_customMem cMem)
 {
     ZSTDMT_CCtxPool* const cctxPool = (ZSTDMT_CCtxPool*) ZSTD_calloc(
@@ -381,7 +379,7 @@
 }
 
 static ZSTDMT_CCtxPool* ZSTDMT_expandCCtxPool(ZSTDMT_CCtxPool* srcPool,
-                                              unsigned nbWorkers)
+                                              int nbWorkers)
 {
     if (srcPool==NULL) return NULL;
     if (nbWorkers <= srcPool->totalCCtx) return srcPool;   /* good enough */
@@ -469,9 +467,9 @@
         DEBUGLOG(4, "LDM window size = %u KB", (1U << params.cParams.windowLog) >> 10);
         ZSTD_ldm_adjustParameters(&params.ldmParams, &params.cParams);
         assert(params.ldmParams.hashLog >= params.ldmParams.bucketSizeLog);
-        assert(params.ldmParams.hashEveryLog < 32);
+        assert(params.ldmParams.hashRateLog < 32);
         serialState->ldmState.hashPower =
-                ZSTD_ldm_getHashPower(params.ldmParams.minMatchLength);
+                ZSTD_rollingHash_primePower(params.ldmParams.minMatchLength);
     } else {
         memset(&params.ldmParams, 0, sizeof(params.ldmParams));
     }
@@ -674,7 +672,7 @@
         if (ZSTD_isError(initError)) JOB_ERROR(initError);
     } else {  /* srcStart points at reloaded section */
         U64 const pledgedSrcSize = job->firstJob ? job->fullFrameSize : job->src.size;
-        {   size_t const forceWindowError = ZSTD_CCtxParam_setParameter(&jobParams, ZSTD_p_forceMaxWindow, !job->firstJob);
+        {   size_t const forceWindowError = ZSTD_CCtxParam_setParameter(&jobParams, ZSTD_c_forceMaxWindow, !job->firstJob);
             if (ZSTD_isError(forceWindowError)) JOB_ERROR(forceWindowError);
         }
         {   size_t const initError = ZSTD_compressBegin_advanced_internal(cctx,
@@ -777,6 +775,14 @@
 
 static const roundBuff_t kNullRoundBuff = {NULL, 0, 0};
 
+#define RSYNC_LENGTH 32
+
+typedef struct {
+  U64 hash;
+  U64 hitMask;
+  U64 primePower;
+} rsyncState_t;
+
 struct ZSTDMT_CCtx_s {
     POOL_ctx* factory;
     ZSTDMT_jobDescription* jobs;
@@ -790,6 +796,7 @@
     inBuff_t inBuff;
     roundBuff_t roundBuff;
     serialState_t serial;
+    rsyncState_t rsync;
     unsigned singleBlockingThread;
     unsigned jobIDMask;
     unsigned doneJobID;
@@ -859,7 +866,7 @@
 {
     if (nbWorkers > ZSTDMT_NBWORKERS_MAX) nbWorkers = ZSTDMT_NBWORKERS_MAX;
     params->nbWorkers = nbWorkers;
-    params->overlapSizeLog = ZSTDMT_OVERLAPLOG_DEFAULT;
+    params->overlapLog = ZSTDMT_OVERLAPLOG_DEFAULT;
     params->jobSize = 0;
     return nbWorkers;
 }
@@ -969,52 +976,59 @@
 }
 
 /* Internal only */
-size_t ZSTDMT_CCtxParam_setMTCtxParameter(ZSTD_CCtx_params* params,
-                                ZSTDMT_parameter parameter, unsigned value) {
+size_t
+ZSTDMT_CCtxParam_setMTCtxParameter(ZSTD_CCtx_params* params,
+                                   ZSTDMT_parameter parameter,
+                                   int value)
+{
     DEBUGLOG(4, "ZSTDMT_CCtxParam_setMTCtxParameter");
     switch(parameter)
     {
     case ZSTDMT_p_jobSize :
-        DEBUGLOG(4, "ZSTDMT_CCtxParam_setMTCtxParameter : set jobSize to %u", value);
-        if ( (value > 0)  /* value==0 => automatic job size */
-           & (value < ZSTDMT_JOBSIZE_MIN) )
+        DEBUGLOG(4, "ZSTDMT_CCtxParam_setMTCtxParameter : set jobSize to %i", value);
+        if ( value != 0  /* default */
+          && value < ZSTDMT_JOBSIZE_MIN)
             value = ZSTDMT_JOBSIZE_MIN;
-        if (value > ZSTDMT_JOBSIZE_MAX)
-            value = ZSTDMT_JOBSIZE_MAX;
+        assert(value >= 0);
+        if (value > ZSTDMT_JOBSIZE_MAX) value = ZSTDMT_JOBSIZE_MAX;
         params->jobSize = value;
         return value;
-    case ZSTDMT_p_overlapSectionLog :
-        if (value > 9) value = 9;
-        DEBUGLOG(4, "ZSTDMT_p_overlapSectionLog : %u", value);
-        params->overlapSizeLog = (value >= 9) ? 9 : value;
+
+    case ZSTDMT_p_overlapLog :
+        DEBUGLOG(4, "ZSTDMT_p_overlapLog : %i", value);
+        if (value < ZSTD_OVERLAPLOG_MIN) value = ZSTD_OVERLAPLOG_MIN;
+        if (value > ZSTD_OVERLAPLOG_MAX) value = ZSTD_OVERLAPLOG_MAX;
+        params->overlapLog = value;
         return value;
+
+    case ZSTDMT_p_rsyncable :
+        value = (value != 0);
+        params->rsyncable = value;
+        return value;
+
     default :
         return ERROR(parameter_unsupported);
     }
 }
 
-size_t ZSTDMT_setMTCtxParameter(ZSTDMT_CCtx* mtctx, ZSTDMT_parameter parameter, unsigned value)
+size_t ZSTDMT_setMTCtxParameter(ZSTDMT_CCtx* mtctx, ZSTDMT_parameter parameter, int value)
 {
     DEBUGLOG(4, "ZSTDMT_setMTCtxParameter");
-    switch(parameter)
-    {
-    case ZSTDMT_p_jobSize :
-        return ZSTDMT_CCtxParam_setMTCtxParameter(&mtctx->params, parameter, value);
-    case ZSTDMT_p_overlapSectionLog :
-        return ZSTDMT_CCtxParam_setMTCtxParameter(&mtctx->params, parameter, value);
-    default :
-        return ERROR(parameter_unsupported);
-    }
+    return ZSTDMT_CCtxParam_setMTCtxParameter(&mtctx->params, parameter, value);
 }
 
-size_t ZSTDMT_getMTCtxParameter(ZSTDMT_CCtx* mtctx, ZSTDMT_parameter parameter, unsigned* value)
+size_t ZSTDMT_getMTCtxParameter(ZSTDMT_CCtx* mtctx, ZSTDMT_parameter parameter, int* value)
 {
     switch (parameter) {
     case ZSTDMT_p_jobSize:
-        *value = mtctx->params.jobSize;
+        assert(mtctx->params.jobSize <= INT_MAX);
+        *value = (int)(mtctx->params.jobSize);
         break;
-    case ZSTDMT_p_overlapSectionLog:
-        *value = mtctx->params.overlapSizeLog;
+    case ZSTDMT_p_overlapLog:
+        *value = mtctx->params.overlapLog;
+        break;
+    case ZSTDMT_p_rsyncable:
+        *value = mtctx->params.rsyncable;
         break;
     default:
         return ERROR(parameter_unsupported);
@@ -1140,22 +1154,66 @@
 /* =====   Multi-threaded compression   ===== */
 /* ------------------------------------------ */
 
-static size_t ZSTDMT_computeTargetJobLog(ZSTD_CCtx_params const params)
+static unsigned ZSTDMT_computeTargetJobLog(ZSTD_CCtx_params const params)
 {
     if (params.ldmParams.enableLdm)
+        /* In Long Range Mode, the windowLog is typically oversized.
+         * In which case, it's preferable to determine the jobSize
+         * based on chainLog instead. */
         return MAX(21, params.cParams.chainLog + 4);
     return MAX(20, params.cParams.windowLog + 2);
 }
 
-static size_t ZSTDMT_computeOverlapLog(ZSTD_CCtx_params const params)
+static int ZSTDMT_overlapLog_default(ZSTD_strategy strat)
 {
-    unsigned const overlapRLog = (params.overlapSizeLog>9) ? 0 : 9-params.overlapSizeLog;
-    if (params.ldmParams.enableLdm)
-        return (MIN(params.cParams.windowLog, ZSTDMT_computeTargetJobLog(params) - 2) - overlapRLog);
-    return overlapRLog >= 9 ? 0 : (params.cParams.windowLog - overlapRLog);
+    switch(strat)
+    {
+        case ZSTD_btultra2:
+            return 9;
+        case ZSTD_btultra:
+        case ZSTD_btopt:
+            return 8;
+        case ZSTD_btlazy2:
+        case ZSTD_lazy2:
+            return 7;
+        case ZSTD_lazy:
+        case ZSTD_greedy:
+        case ZSTD_dfast:
+        case ZSTD_fast:
+        default:;
+    }
+    return 6;
 }
 
-static unsigned ZSTDMT_computeNbJobs(ZSTD_CCtx_params params, size_t srcSize, unsigned nbWorkers) {
+static int ZSTDMT_overlapLog(int ovlog, ZSTD_strategy strat)
+{
+    assert(0 <= ovlog && ovlog <= 9);
+    if (ovlog == 0) return ZSTDMT_overlapLog_default(strat);
+    return ovlog;
+}
+
+static size_t ZSTDMT_computeOverlapSize(ZSTD_CCtx_params const params)
+{
+    int const overlapRLog = 9 - ZSTDMT_overlapLog(params.overlapLog, params.cParams.strategy);
+    int ovLog = (overlapRLog >= 8) ? 0 : (params.cParams.windowLog - overlapRLog);
+    assert(0 <= overlapRLog && overlapRLog <= 8);
+    if (params.ldmParams.enableLdm) {
+        /* In Long Range Mode, the windowLog is typically oversized.
+         * In which case, it's preferable to determine the jobSize
+         * based on chainLog instead.
+         * Then, ovLog becomes a fraction of the jobSize, rather than windowSize */
+        ovLog = MIN(params.cParams.windowLog, ZSTDMT_computeTargetJobLog(params) - 2)
+                - overlapRLog;
+    }
+    assert(0 <= ovLog && ovLog <= 30);
+    DEBUGLOG(4, "overlapLog : %i", params.overlapLog);
+    DEBUGLOG(4, "overlap size : %i", 1 << ovLog);
+    return (ovLog==0) ? 0 : (size_t)1 << ovLog;
+}
+
+static unsigned
+ZSTDMT_computeNbJobs(ZSTD_CCtx_params params, size_t srcSize, unsigned nbWorkers)
+{
     assert(nbWorkers>0);
     {   size_t const jobSizeTarget = (size_t)1 << ZSTDMT_computeTargetJobLog(params);
         size_t const jobMaxSize = jobSizeTarget << 2;
@@ -1178,7 +1236,7 @@
                 ZSTD_CCtx_params params)
 {
     ZSTD_CCtx_params const jobParams = ZSTDMT_initJobCCtxParams(params);
-    size_t const overlapSize = (size_t)1 << ZSTDMT_computeOverlapLog(params);
+    size_t const overlapSize = ZSTDMT_computeOverlapSize(params);
     unsigned const nbJobs = ZSTDMT_computeNbJobs(params, srcSize, params.nbWorkers);
     size_t const proposedJobSize = (srcSize + (nbJobs-1)) / nbJobs;
     size_t const avgJobSize = (((proposedJobSize-1) & 0x1FFFF) < 0x7FFF) ? proposedJobSize + 0xFFFF : proposedJobSize;   /* avoid too small last block */
@@ -1289,16 +1347,17 @@
 }
 
 size_t ZSTDMT_compress_advanced(ZSTDMT_CCtx* mtctx,
-                               void* dst, size_t dstCapacity,
-                         const void* src, size_t srcSize,
-                         const ZSTD_CDict* cdict,
-                               ZSTD_parameters params,
-                               unsigned overlapLog)
+                                void* dst, size_t dstCapacity,
+                          const void* src, size_t srcSize,
+                          const ZSTD_CDict* cdict,
+                                ZSTD_parameters params,
+                                int overlapLog)
 {
     ZSTD_CCtx_params cctxParams = mtctx->params;
     cctxParams.cParams = params.cParams;
     cctxParams.fParams = params.fParams;
-    cctxParams.overlapSizeLog = overlapLog;
+    assert(ZSTD_OVERLAPLOG_MIN <= overlapLog && overlapLog <= ZSTD_OVERLAPLOG_MAX);
+    cctxParams.overlapLog = overlapLog;
     return ZSTDMT_compress_advanced_internal(mtctx,
                                              dst, dstCapacity,
                                              src, srcSize,
@@ -1311,8 +1370,8 @@
                      const void* src, size_t srcSize,
                            int compressionLevel)
 {
-    U32 const overlapLog = (compressionLevel >= ZSTD_maxCLevel()) ? 9 : ZSTDMT_OVERLAPLOG_DEFAULT;
     ZSTD_parameters params = ZSTD_getParams(compressionLevel, srcSize, 0);
+    int const overlapLog = ZSTDMT_overlapLog_default(params.cParams.strategy);
     params.fParams.contentSizeFlag = 1;
     return ZSTDMT_compress_advanced(mtctx, dst, dstCapacity, src, srcSize, NULL, params, overlapLog);
 }
@@ -1339,8 +1398,8 @@
     if (params.nbWorkers != mtctx->params.nbWorkers)
         CHECK_F( ZSTDMT_resize(mtctx, params.nbWorkers) );
 
-    if (params.jobSize > 0 && params.jobSize < ZSTDMT_JOBSIZE_MIN) params.jobSize = ZSTDMT_JOBSIZE_MIN;
-    if (params.jobSize > ZSTDMT_JOBSIZE_MAX) params.jobSize = ZSTDMT_JOBSIZE_MAX;
+    if (params.jobSize != 0 && params.jobSize < ZSTDMT_JOBSIZE_MIN) params.jobSize = ZSTDMT_JOBSIZE_MIN;
+    if (params.jobSize > (size_t)ZSTDMT_JOBSIZE_MAX) params.jobSize = ZSTDMT_JOBSIZE_MAX;
 
     mtctx->singleBlockingThread = (pledgedSrcSize <= ZSTDMT_JOBSIZE_MIN);  /* do not trigger multi-threading when srcSize is too small */
     if (mtctx->singleBlockingThread) {
@@ -1375,14 +1434,24 @@
         mtctx->cdict = cdict;
     }
 
-    mtctx->targetPrefixSize = (size_t)1 << ZSTDMT_computeOverlapLog(params);
-    DEBUGLOG(4, "overlapLog=%u => %u KB", params.overlapSizeLog, (U32)(mtctx->targetPrefixSize>>10));
+    mtctx->targetPrefixSize = ZSTDMT_computeOverlapSize(params);
+    DEBUGLOG(4, "overlapLog=%i => %u KB", params.overlapLog, (U32)(mtctx->targetPrefixSize>>10));
     mtctx->targetSectionSize = params.jobSize;
     if (mtctx->targetSectionSize == 0) {
         mtctx->targetSectionSize = 1ULL << ZSTDMT_computeTargetJobLog(params);
     }
+    if (params.rsyncable) {
+        /* Aim for the targetsectionSize as the average job size. */
+        U32 const jobSizeMB = (U32)(mtctx->targetSectionSize >> 20);
+        U32 const rsyncBits = ZSTD_highbit32(jobSizeMB) + 20;
+        assert(jobSizeMB >= 1);
+        DEBUGLOG(4, "rsyncLog = %u", rsyncBits);
+        mtctx->rsync.hash = 0;
+        mtctx->rsync.hitMask = (1ULL << rsyncBits) - 1;
+        mtctx->rsync.primePower = ZSTD_rollingHash_primePower(RSYNC_LENGTH);
+    }
     if (mtctx->targetSectionSize < mtctx->targetPrefixSize) mtctx->targetSectionSize = mtctx->targetPrefixSize;  /* job size must be >= overlap size */
-    DEBUGLOG(4, "Job Size : %u KB (note : set to %u)", (U32)(mtctx->targetSectionSize>>10), params.jobSize);
+    DEBUGLOG(4, "Job Size : %u KB (note : set to %u)", (U32)(mtctx->targetSectionSize>>10), (U32)params.jobSize);
     DEBUGLOG(4, "inBuff Size : %u KB", (U32)(mtctx->targetSectionSize>>10));
     ZSTDMT_setBufferSize(mtctx->bufPool, ZSTD_compressBound(mtctx->targetSectionSize));
     {
@@ -1818,6 +1887,89 @@
     return 1;
 }
 
+typedef struct {
+  size_t toLoad;  /* The number of bytes to load from the input. */
+  int flush;      /* Boolean declaring if we must flush because we found a synchronization point. */
+} syncPoint_t;
+
+/**
+ * Searches through the input for a synchronization point. If one is found, we
+ * will instruct the caller to flush, and return the number of bytes to load.
+ * Otherwise, we will load as many bytes as possible and instruct the caller
+ * to continue as normal.
+ */
+static syncPoint_t
+findSynchronizationPoint(ZSTDMT_CCtx const* mtctx, ZSTD_inBuffer const input)
+{
+    BYTE const* const istart = (BYTE const*)input.src + input.pos;
+    U64 const primePower = mtctx->rsync.primePower;
+    U64 const hitMask = mtctx->rsync.hitMask;
+
+    syncPoint_t syncPoint;
+    U64 hash;
+    BYTE const* prev;
+    size_t pos;
+
+    syncPoint.toLoad = MIN(input.size - input.pos, mtctx->targetSectionSize - mtctx->inBuff.filled);
+    syncPoint.flush = 0;
+    if (!mtctx->params.rsyncable)
+        /* Rsync is disabled. */
+        return syncPoint;
+    if (mtctx->inBuff.filled + syncPoint.toLoad < RSYNC_LENGTH)
+        /* Not enough to compute the hash.
+         * We will miss any synchronization points in this RSYNC_LENGTH byte
+         * window. However, since it depends only in the internal buffers, if the
+         * state is already synchronized, we will remain synchronized.
+         * Additionally, the probability that we miss a synchronization point is
+         * low: RSYNC_LENGTH / targetSectionSize.
+         */
+        return syncPoint;
+    /* Initialize the loop variables. */
+    if (mtctx->inBuff.filled >= RSYNC_LENGTH) {
+        /* We have enough bytes buffered to initialize the hash.
+         * Start scanning at the beginning of the input.
+         */
+        pos = 0;
+        prev = (BYTE const*)mtctx->inBuff.buffer.start + mtctx->inBuff.filled - RSYNC_LENGTH;
+        hash = ZSTD_rollingHash_compute(prev, RSYNC_LENGTH);
+    } else {
+        /* We don't have enough bytes buffered to initialize the hash, but
+         * we know we have at least RSYNC_LENGTH bytes total.
+         * Start scanning after the first RSYNC_LENGTH bytes less the bytes
+         * already buffered.
+         */
+        pos = RSYNC_LENGTH - mtctx->inBuff.filled;
+        prev = (BYTE const*)mtctx->inBuff.buffer.start - pos;
+        hash = ZSTD_rollingHash_compute(mtctx->inBuff.buffer.start, mtctx->inBuff.filled);
+        hash = ZSTD_rollingHash_append(hash, istart, pos);
+    }
+    /* Starting with the hash of the previous RSYNC_LENGTH bytes, roll
+     * through the input. If we hit a synchronization point, then cut the
+     * job off, and tell the compressor to flush the job. Otherwise, load
+     * all the bytes and continue as normal.
+     * If we go too long without a synchronization point (targetSectionSize)
+     * then a block will be emitted anyways, but this is okay, since if we
+     * are already synchronized we will remain synchronized.
+     */
+    for (; pos < syncPoint.toLoad; ++pos) {
+        BYTE const toRemove = pos < RSYNC_LENGTH ? prev[pos] : istart[pos - RSYNC_LENGTH];
+        /* if (pos >= RSYNC_LENGTH) assert(ZSTD_rollingHash_compute(istart + pos - RSYNC_LENGTH, RSYNC_LENGTH) == hash); */
+        hash = ZSTD_rollingHash_rotate(hash, toRemove, istart[pos], primePower);
+        if ((hash & hitMask) == hitMask) {
+            syncPoint.toLoad = pos + 1;
+            syncPoint.flush = 1;
+            break;
+        }
+    }
+    return syncPoint;
+}
+
+size_t ZSTDMT_nextInputSizeHint(const ZSTDMT_CCtx* mtctx)
+{
+    size_t hintInSize = mtctx->targetSectionSize - mtctx->inBuff.filled;
+    if (hintInSize==0) hintInSize = mtctx->targetSectionSize;
+    return hintInSize;
+}
 
 /** ZSTDMT_compressStream_generic() :
  *  internal use only - exposed to be invoked from zstd_compress.c
@@ -1844,7 +1996,8 @@
     }
 
     /* single-pass shortcut (note : synchronous-mode) */
-    if ( (mtctx->nextJobID == 0)      /* just started */
+    if ( (!mtctx->params.rsyncable)   /* rsyncable mode is disabled */
+      && (mtctx->nextJobID == 0)      /* just started */
       && (mtctx->inBuff.filled == 0)  /* nothing buffered */
       && (!mtctx->jobReady)           /* no job already created */
       && (endOp == ZSTD_e_end)        /* end order */
@@ -1876,14 +2029,17 @@
                 DEBUGLOG(5, "ZSTDMT_tryGetInputRange completed successfully : mtctx->inBuff.buffer.start = %p", mtctx->inBuff.buffer.start);
         }
         if (mtctx->inBuff.buffer.start != NULL) {
-            size_t const toLoad = MIN(input->size - input->pos, mtctx->targetSectionSize - mtctx->inBuff.filled);
+            syncPoint_t const syncPoint = findSynchronizationPoint(mtctx, *input);
+            if (syncPoint.flush && endOp == ZSTD_e_continue) {
+                endOp = ZSTD_e_flush;
+            }
             assert(mtctx->inBuff.buffer.capacity >= mtctx->targetSectionSize);
             DEBUGLOG(5, "ZSTDMT_compressStream_generic: adding %u bytes on top of %u to buffer of size %u",
-                        (U32)toLoad, (U32)mtctx->inBuff.filled, (U32)mtctx->targetSectionSize);
-            memcpy((char*)mtctx->inBuff.buffer.start + mtctx->inBuff.filled, (const char*)input->src + input->pos, toLoad);
-            input->pos += toLoad;
-            mtctx->inBuff.filled += toLoad;
-            forwardInputProgress = toLoad>0;
+                        (U32)syncPoint.toLoad, (U32)mtctx->inBuff.filled, (U32)mtctx->targetSectionSize);
+            memcpy((char*)mtctx->inBuff.buffer.start + mtctx->inBuff.filled, (const char*)input->src + input->pos, syncPoint.toLoad);
+            input->pos += syncPoint.toLoad;
+            mtctx->inBuff.filled += syncPoint.toLoad;
+            forwardInputProgress = syncPoint.toLoad>0;
         }
         if ((input->pos < input->size) && (endOp == ZSTD_e_end))
             endOp = ZSTD_e_flush;   /* can't end now : not all input consumed */
diff --git a/contrib/python-zstandard/zstd/decompress/huf_decompress.c b/contrib/python-zstandard/zstd/decompress/huf_decompress.c
--- a/contrib/python-zstandard/zstd/decompress/huf_decompress.c
+++ b/contrib/python-zstandard/zstd/decompress/huf_decompress.c
@@ -43,6 +43,19 @@
 #include "huf.h"
 #include "error_private.h"
 
+/* **************************************************************
+*  Macros
+****************************************************************/
+
+/* These two optional macros force the use one way or another of the two
+ * Huffman decompression implementations. You can't force in both directions
+ * at the same time.
+ */
+#if defined(HUF_FORCE_DECOMPRESS_X1) && \
+    defined(HUF_FORCE_DECOMPRESS_X2)
+#error "Cannot force the use of the X1 and X2 decoders at the same time!"
+#endif
+
 
 /* **************************************************************
 *  Error Management
@@ -58,6 +71,51 @@
 #define HUF_ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
 
 
+/* **************************************************************
+*  BMI2 Variant Wrappers
+****************************************************************/
+#if DYNAMIC_BMI2
+
+#define HUF_DGEN(fn)                                                        \
+                                                                            \
+    static size_t fn##_default(                                             \
+                  void* dst,  size_t dstSize,                               \
+            const void* cSrc, size_t cSrcSize,                              \
+            const HUF_DTable* DTable)                                       \
+    {                                                                       \
+        return fn##_body(dst, dstSize, cSrc, cSrcSize, DTable);             \
+    }                                                                       \
+                                                                            \
+    static TARGET_ATTRIBUTE("bmi2") size_t fn##_bmi2(                       \
+                  void* dst,  size_t dstSize,                               \
+            const void* cSrc, size_t cSrcSize,                              \
+            const HUF_DTable* DTable)                                       \
+    {                                                                       \
+        return fn##_body(dst, dstSize, cSrc, cSrcSize, DTable);             \
+    }                                                                       \
+                                                                            \
+    static size_t fn(void* dst, size_t dstSize, void const* cSrc,           \
+                     size_t cSrcSize, HUF_DTable const* DTable, int bmi2)   \
+    {                                                                       \
+        if (bmi2) {                                                         \
+            return fn##_bmi2(dst, dstSize, cSrc, cSrcSize, DTable);         \
+        }                                                                   \
+        return fn##_default(dst, dstSize, cSrc, cSrcSize, DTable);          \
+    }
+
+#else
+
+#define HUF_DGEN(fn)                                                        \
+    static size_t fn(void* dst, size_t dstSize, void const* cSrc,           \
+                     size_t cSrcSize, HUF_DTable const* DTable, int bmi2)   \
+    {                                                                       \
+        (void)bmi2;                                                         \
+        return fn##_body(dst, dstSize, cSrc, cSrcSize, DTable);             \
+    }
+
+#endif
+
+
 /*-***************************/
 /*  generic DTableDesc       */
 /*-***************************/
@@ -71,6 +129,8 @@
 }
 
 
+#ifndef HUF_FORCE_DECOMPRESS_X2
+
 /*-***************************/
 /*  single-symbol decoding   */
 /*-***************************/
@@ -307,46 +367,6 @@
                                                const void *cSrc,
                                                size_t cSrcSize,
                                                const HUF_DTable *DTable);
-#if DYNAMIC_BMI2
-
-#define HUF_DGEN(fn)                                                               \
-                                                                            \
-    static size_t fn##_default(                                             \
-                  void* dst,  size_t dstSize,                               \
-            const void* cSrc, size_t cSrcSize,                              \
-            const HUF_DTable* DTable)                                       \
-    {                                                                       \
-        return fn##_body(dst, dstSize, cSrc, cSrcSize, DTable);             \
-    }                                                                       \
-                                                                            \
-    static TARGET_ATTRIBUTE("bmi2") size_t fn##_bmi2(                       \
-                  void* dst,  size_t dstSize,                               \
-            const void* cSrc, size_t cSrcSize,                              \
-            const HUF_DTable* DTable)                                       \
-    {                                                                       \
-        return fn##_body(dst, dstSize, cSrc, cSrcSize, DTable);             \
-    }                                                                       \
-                                                                            \
-    static size_t fn(void* dst, size_t dstSize, void const* cSrc,           \
-                     size_t cSrcSize, HUF_DTable const* DTable, int bmi2)   \
-    {                                                                       \
-        if (bmi2) {                                                         \
-            return fn##_bmi2(dst, dstSize, cSrc, cSrcSize, DTable);         \
-        }                                                                   \
-        return fn##_default(dst, dstSize, cSrc, cSrcSize, DTable);          \
-    }
-
-#else
-
-#define HUF_DGEN(fn)                                                               \
-    static size_t fn(void* dst, size_t dstSize, void const* cSrc,           \
-                     size_t cSrcSize, HUF_DTable const* DTable, int bmi2)   \
-    {                                                                       \
-        (void)bmi2;                                                         \
-        return fn##_body(dst, dstSize, cSrc, cSrcSize, DTable);             \
-    }
-
-#endif
 
 HUF_DGEN(HUF_decompress1X1_usingDTable_internal)
 HUF_DGEN(HUF_decompress4X1_usingDTable_internal)
@@ -437,6 +457,10 @@
     return HUF_decompress4X1_DCtx(DTable, dst, dstSize, cSrc, cSrcSize);
 }
 
+#endif /* HUF_FORCE_DECOMPRESS_X2 */
+
+
+#ifndef HUF_FORCE_DECOMPRESS_X1
 
 /* *************************/
 /* double-symbols decoding */
@@ -911,6 +935,8 @@
     return HUF_decompress4X2_DCtx(DTable, dst, dstSize, cSrc, cSrcSize);
 }
 
+#endif /* HUF_FORCE_DECOMPRESS_X1 */
+
 
 /* ***********************************/
 /* Universal decompression selectors */
@@ -921,8 +947,18 @@
                                     const HUF_DTable* DTable)
 {
     DTableDesc const dtd = HUF_getDTableDesc(DTable);
+#if defined(HUF_FORCE_DECOMPRESS_X1)
+    (void)dtd;
+    assert(dtd.tableType == 0);
+    return HUF_decompress1X1_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0);
+#elif defined(HUF_FORCE_DECOMPRESS_X2)
+    (void)dtd;
+    assert(dtd.tableType == 1);
+    return HUF_decompress1X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0);
+#else
     return dtd.tableType ? HUF_decompress1X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0) :
                            HUF_decompress1X1_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0);
+#endif
 }
 
 size_t HUF_decompress4X_usingDTable(void* dst, size_t maxDstSize,
@@ -930,11 +966,22 @@
                                     const HUF_DTable* DTable)
 {
     DTableDesc const dtd = HUF_getDTableDesc(DTable);
+#if defined(HUF_FORCE_DECOMPRESS_X1)
+    (void)dtd;
+    assert(dtd.tableType == 0);
+    return HUF_decompress4X1_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0);
+#elif defined(HUF_FORCE_DECOMPRESS_X2)
+    (void)dtd;
+    assert(dtd.tableType == 1);
+    return HUF_decompress4X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0);
+#else
     return dtd.tableType ? HUF_decompress4X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0) :
                            HUF_decompress4X1_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0);
+#endif
 }
 
 
+#if !defined(HUF_FORCE_DECOMPRESS_X1) && !defined(HUF_FORCE_DECOMPRESS_X2)
 typedef struct { U32 tableTime; U32 decode256Time; } algo_time_t;
 static const algo_time_t algoTime[16 /* Quantization */][3 /* single, double, quad */] =
 {
@@ -956,6 +1003,7 @@
     {{1455,128}, {2422,124}, {4174,124}},   /* Q ==14 : 87-93% */
     {{ 722,128}, {1891,145}, {1936,146}},   /* Q ==15 : 93-99% */
 };
+#endif
 
 /** HUF_selectDecoder() :
  *  Tells which decoder is likely to decode faster,
@@ -966,6 +1014,15 @@
 {
     assert(dstSize > 0);
     assert(dstSize <= 128*1024);
+#if defined(HUF_FORCE_DECOMPRESS_X1)
+    (void)dstSize;
+    (void)cSrcSize;
+    return 0;
+#elif defined(HUF_FORCE_DECOMPRESS_X2)
+    (void)dstSize;
+    (void)cSrcSize;
+    return 1;
+#else
     /* decoder timing evaluation */
     {   U32 const Q = (cSrcSize >= dstSize) ? 15 : (U32)(cSrcSize * 16 / dstSize);   /* Q < 16 */
         U32 const D256 = (U32)(dstSize >> 8);
@@ -973,14 +1030,18 @@
         U32 DTime1 = algoTime[Q][1].tableTime + (algoTime[Q][1].decode256Time * D256);
         DTime1 += DTime1 >> 3;  /* advantage to algorithm using less memory, to reduce cache eviction */
         return DTime1 < DTime0;
-}   }
+    }
+#endif
+}
 
 
 typedef size_t (*decompressionAlgo)(void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize);
 
 size_t HUF_decompress (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize)
 {
+#if !defined(HUF_FORCE_DECOMPRESS_X1) && !defined(HUF_FORCE_DECOMPRESS_X2)
     static const decompressionAlgo decompress[2] = { HUF_decompress4X1, HUF_decompress4X2 };
+#endif
 
     /* validation checks */
     if (dstSize == 0) return ERROR(dstSize_tooSmall);
@@ -989,7 +1050,17 @@
     if (cSrcSize == 1) { memset(dst, *(const BYTE*)cSrc, dstSize); return dstSize; }   /* RLE */
 
     {   U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize);
+#if defined(HUF_FORCE_DECOMPRESS_X1)
+        (void)algoNb;
+        assert(algoNb == 0);
+        return HUF_decompress4X1(dst, dstSize, cSrc, cSrcSize);
+#elif defined(HUF_FORCE_DECOMPRESS_X2)
+        (void)algoNb;
+        assert(algoNb == 1);
+        return HUF_decompress4X2(dst, dstSize, cSrc, cSrcSize);
+#else
         return decompress[algoNb](dst, dstSize, cSrc, cSrcSize);
+#endif
     }
 }
 
@@ -1002,8 +1073,18 @@
     if (cSrcSize == 1) { memset(dst, *(const BYTE*)cSrc, dstSize); return dstSize; }   /* RLE */
 
     {   U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize);
+#if defined(HUF_FORCE_DECOMPRESS_X1)
+        (void)algoNb;
+        assert(algoNb == 0);
+        return HUF_decompress4X1_DCtx(dctx, dst, dstSize, cSrc, cSrcSize);
+#elif defined(HUF_FORCE_DECOMPRESS_X2)
+        (void)algoNb;
+        assert(algoNb == 1);
+        return HUF_decompress4X2_DCtx(dctx, dst, dstSize, cSrc, cSrcSize);
+#else
         return algoNb ? HUF_decompress4X2_DCtx(dctx, dst, dstSize, cSrc, cSrcSize) :
                         HUF_decompress4X1_DCtx(dctx, dst, dstSize, cSrc, cSrcSize) ;
+#endif
     }
 }
 
@@ -1025,8 +1106,19 @@
     if (cSrcSize == 0) return ERROR(corruption_detected);
 
     {   U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize);
-        return algoNb ? HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workSpace, wkspSize):
+#if defined(HUF_FORCE_DECOMPRESS_X1)
+        (void)algoNb;
+        assert(algoNb == 0);
+        return HUF_decompress4X1_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workSpace, wkspSize);
+#elif defined(HUF_FORCE_DECOMPRESS_X2)
+        (void)algoNb;
+        assert(algoNb == 1);
+        return HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workSpace, wkspSize);
+#else
+        return algoNb ? HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cSrc,
+                            cSrcSize, workSpace, wkspSize):
                         HUF_decompress4X1_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, workSpace, wkspSize);
+#endif
     }
 }
 
@@ -1041,10 +1133,22 @@
     if (cSrcSize == 1) { memset(dst, *(const BYTE*)cSrc, dstSize); return dstSize; }   /* RLE */
 
     {   U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize);
+#if defined(HUF_FORCE_DECOMPRESS_X1)
+        (void)algoNb;
+        assert(algoNb == 0);
+        return HUF_decompress1X1_DCtx_wksp(dctx, dst, dstSize, cSrc,
+                                cSrcSize, workSpace, wkspSize);
+#elif defined(HUF_FORCE_DECOMPRESS_X2)
+        (void)algoNb;
+        assert(algoNb == 1);
+        return HUF_decompress1X2_DCtx_wksp(dctx, dst, dstSize, cSrc,
+                                cSrcSize, workSpace, wkspSize);
+#else
         return algoNb ? HUF_decompress1X2_DCtx_wksp(dctx, dst, dstSize, cSrc,
                                 cSrcSize, workSpace, wkspSize):
                         HUF_decompress1X1_DCtx_wksp(dctx, dst, dstSize, cSrc,
                                 cSrcSize, workSpace, wkspSize);
+#endif
     }
 }
 
@@ -1060,10 +1164,21 @@
 size_t HUF_decompress1X_usingDTable_bmi2(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2)
 {
     DTableDesc const dtd = HUF_getDTableDesc(DTable);
+#if defined(HUF_FORCE_DECOMPRESS_X1)
+    (void)dtd;
+    assert(dtd.tableType == 0);
+    return HUF_decompress1X1_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, bmi2);
+#elif defined(HUF_FORCE_DECOMPRESS_X2)
+    (void)dtd;
+    assert(dtd.tableType == 1);
+    return HUF_decompress1X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, bmi2);
+#else
     return dtd.tableType ? HUF_decompress1X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, bmi2) :
                            HUF_decompress1X1_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, bmi2);
+#endif
 }
 
+#ifndef HUF_FORCE_DECOMPRESS_X2
 size_t HUF_decompress1X1_DCtx_wksp_bmi2(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, int bmi2)
 {
     const BYTE* ip = (const BYTE*) cSrc;
@@ -1075,12 +1190,23 @@
 
     return HUF_decompress1X1_usingDTable_internal(dst, dstSize, ip, cSrcSize, dctx, bmi2);
 }
+#endif
 
 size_t HUF_decompress4X_usingDTable_bmi2(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2)
 {
     DTableDesc const dtd = HUF_getDTableDesc(DTable);
+#if defined(HUF_FORCE_DECOMPRESS_X1)
+    (void)dtd;
+    assert(dtd.tableType == 0);
+    return HUF_decompress4X1_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, bmi2);
+#elif defined(HUF_FORCE_DECOMPRESS_X2)
+    (void)dtd;
+    assert(dtd.tableType == 1);
+    return HUF_decompress4X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, bmi2);
+#else
     return dtd.tableType ? HUF_decompress4X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, bmi2) :
                            HUF_decompress4X1_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, bmi2);
+#endif
 }
 
 size_t HUF_decompress4X_hufOnly_wksp_bmi2(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, int bmi2)
@@ -1090,7 +1216,17 @@
     if (cSrcSize == 0) return ERROR(corruption_detected);
 
     {   U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize);
+#if defined(HUF_FORCE_DECOMPRESS_X1)
+        (void)algoNb;
+        assert(algoNb == 0);
+        return HUF_decompress4X1_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, cSrcSize, workSpace, wkspSize, bmi2);
+#elif defined(HUF_FORCE_DECOMPRESS_X2)
+        (void)algoNb;
+        assert(algoNb == 1);
+        return HUF_decompress4X2_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, cSrcSize, workSpace, wkspSize, bmi2);
+#else
         return algoNb ? HUF_decompress4X2_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, cSrcSize, workSpace, wkspSize, bmi2) :
                         HUF_decompress4X1_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, cSrcSize, workSpace, wkspSize, bmi2);
+#endif
     }
 }
diff --git a/contrib/python-zstandard/zstd/decompress/zstd_ddict.h b/contrib/python-zstandard/zstd/decompress/zstd_ddict.h
new file mode 100644
--- /dev/null
+++ b/contrib/python-zstandard/zstd/decompress/zstd_ddict.h
@@ -0,0 +1,44 @@
+/*
+ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This source code is licensed under both the BSD-style license (found in the
+ * LICENSE file in the root directory of this source tree) and the GPLv2 (found
+ * in the COPYING file in the root directory of this source tree).
+ * You may select, at your option, one of the above-listed licenses.
+ */
+
+
+#ifndef ZSTD_DDICT_H
+#define ZSTD_DDICT_H
+
+/*-*******************************************************
+ *  Dependencies
+ *********************************************************/
+#include <stddef.h>   /* size_t */
+#include "zstd.h"     /* ZSTD_DDict, and several public functions */
+
+
+/*-*******************************************************
+ *  Interface
+ *********************************************************/
+
+/* note: several prototypes are already published in `zstd.h` :
+ * ZSTD_createDDict()
+ * ZSTD_createDDict_byReference()
+ * ZSTD_createDDict_advanced()
+ * ZSTD_freeDDict()
+ * ZSTD_initStaticDDict()
+ * ZSTD_sizeof_DDict()
+ * ZSTD_estimateDDictSize()
+ * ZSTD_getDictID_fromDict()
+ */
+
+const void* ZSTD_DDict_dictContent(const ZSTD_DDict* ddict);
+size_t ZSTD_DDict_dictSize(const ZSTD_DDict* ddict);
+
+void ZSTD_copyDDictParameters(ZSTD_DCtx* dctx, const ZSTD_DDict* ddict);
+
+
+
+#endif /* ZSTD_DDICT_H */
diff --git a/contrib/python-zstandard/zstd/decompress/zstd_ddict.c b/contrib/python-zstandard/zstd/decompress/zstd_ddict.c
new file mode 100644
--- /dev/null
+++ b/contrib/python-zstandard/zstd/decompress/zstd_ddict.c
@@ -0,0 +1,240 @@
+/*
+ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This source code is licensed under both the BSD-style license (found in the
+ * LICENSE file in the root directory of this source tree) and the GPLv2 (found
+ * in the COPYING file in the root directory of this source tree).
+ * You may select, at your option, one of the above-listed licenses.
+ */
+
+/* zstd_ddict.c :
+ * concentrates all logic that needs to know the internals of ZSTD_DDict object */
+
+/*-*******************************************************
+*  Dependencies
+*********************************************************/
+#include <string.h>      /* memcpy, memmove, memset */
+#include "cpu.h"         /* bmi2 */
+#include "mem.h"         /* low level memory routines */
+#define FSE_STATIC_LINKING_ONLY
+#include "fse.h"
+#define HUF_STATIC_LINKING_ONLY
+#include "huf.h"
+#include "zstd_decompress_internal.h"
+#include "zstd_ddict.h"
+
+#if defined(ZSTD_LEGACY_SUPPORT) && (ZSTD_LEGACY_SUPPORT>=1)
+#  include "zstd_legacy.h"
+#endif
+
+
+
+/*-*******************************************************
+*  Types
+*********************************************************/
+struct ZSTD_DDict_s {
+    void* dictBuffer;
+    const void* dictContent;
+    size_t dictSize;
+    ZSTD_entropyDTables_t entropy;
+    U32 dictID;
+    U32 entropyPresent;
+    ZSTD_customMem cMem;
+};  /* typedef'd to ZSTD_DDict within "zstd.h" */
+
+const void* ZSTD_DDict_dictContent(const ZSTD_DDict* ddict)
+{
+    assert(ddict != NULL);
+    return ddict->dictContent;
+}
+
+size_t ZSTD_DDict_dictSize(const ZSTD_DDict* ddict)
+{
+    assert(ddict != NULL);
+    return ddict->dictSize;
+}
+
+void ZSTD_copyDDictParameters(ZSTD_DCtx* dctx, const ZSTD_DDict* ddict)
+{
+    DEBUGLOG(4, "ZSTD_copyDDictParameters");
+    assert(dctx != NULL);
+    assert(ddict != NULL);
+    dctx->dictID = ddict->dictID;
+    dctx->prefixStart = ddict->dictContent;
+    dctx->virtualStart = ddict->dictContent;
+    dctx->dictEnd = (const BYTE*)ddict->dictContent + ddict->dictSize;
+    dctx->previousDstEnd = dctx->dictEnd;
+    if (ddict->entropyPresent) {
+        dctx->litEntropy = 1;
+        dctx->fseEntropy = 1;
+        dctx->LLTptr = ddict->entropy.LLTable;
+        dctx->MLTptr = ddict->entropy.MLTable;
+        dctx->OFTptr = ddict->entropy.OFTable;
+        dctx->HUFptr = ddict->entropy.hufTable;
+        dctx->entropy.rep[0] = ddict->entropy.rep[0];
+        dctx->entropy.rep[1] = ddict->entropy.rep[1];
+        dctx->entropy.rep[2] = ddict->entropy.rep[2];
+    } else {
+        dctx->litEntropy = 0;
+        dctx->fseEntropy = 0;
+    }
+}
+
+
+static size_t
+ZSTD_loadEntropy_intoDDict(ZSTD_DDict* ddict,
+                           ZSTD_dictContentType_e dictContentType)
+{
+    ddict->dictID = 0;
+    ddict->entropyPresent = 0;
+    if (dictContentType == ZSTD_dct_rawContent) return 0;
+
+    if (ddict->dictSize < 8) {
+        if (dictContentType == ZSTD_dct_fullDict)
+            return ERROR(dictionary_corrupted);   /* only accept specified dictionaries */
+        return 0;   /* pure content mode */
+    }
+    {   U32 const magic = MEM_readLE32(ddict->dictContent);
+        if (magic != ZSTD_MAGIC_DICTIONARY) {
+            if (dictContentType == ZSTD_dct_fullDict)
+                return ERROR(dictionary_corrupted);   /* only accept specified dictionaries */
+            return 0;   /* pure content mode */
+        }
+    }
+    ddict->dictID = MEM_readLE32((const char*)ddict->dictContent + ZSTD_FRAMEIDSIZE);
+
+    /* load entropy tables */
+    CHECK_E( ZSTD_loadDEntropy(&ddict->entropy,
+                                ddict->dictContent, ddict->dictSize),
+             dictionary_corrupted );
+    ddict->entropyPresent = 1;
+    return 0;
+}
+
+
+static size_t ZSTD_initDDict_internal(ZSTD_DDict* ddict,
+                                      const void* dict, size_t dictSize,
+                                      ZSTD_dictLoadMethod_e dictLoadMethod,
+                                      ZSTD_dictContentType_e dictContentType)
+{
+    if ((dictLoadMethod == ZSTD_dlm_byRef) || (!dict) || (!dictSize)) {
+        ddict->dictBuffer = NULL;
+        ddict->dictContent = dict;
+        if (!dict) dictSize = 0;
+    } else {
+        void* const internalBuffer = ZSTD_malloc(dictSize, ddict->cMem);
+        ddict->dictBuffer = internalBuffer;
+        ddict->dictContent = internalBuffer;
+        if (!internalBuffer) return ERROR(memory_allocation);
+        memcpy(internalBuffer, dict, dictSize);
+    }
+    ddict->dictSize = dictSize;
+    ddict->entropy.hufTable[0] = (HUF_DTable)((HufLog)*0x1000001);  /* cover both little and big endian */
+
+    /* parse dictionary content */
+    CHECK_F( ZSTD_loadEntropy_intoDDict(ddict, dictContentType) );
+
+    return 0;
+}
+
+ZSTD_DDict* ZSTD_createDDict_advanced(const void* dict, size_t dictSize,
+                                      ZSTD_dictLoadMethod_e dictLoadMethod,
+                                      ZSTD_dictContentType_e dictContentType,
+                                      ZSTD_customMem customMem)
+{
+    if (!customMem.customAlloc ^ !customMem.customFree) return NULL;
+
+    {   ZSTD_DDict* const ddict = (ZSTD_DDict*) ZSTD_malloc(sizeof(ZSTD_DDict), customMem);
+        if (ddict == NULL) return NULL;
+        ddict->cMem = customMem;
+        {   size_t const initResult = ZSTD_initDDict_internal(ddict,
+                                            dict, dictSize,
+                                            dictLoadMethod, dictContentType);
+            if (ZSTD_isError(initResult)) {
+                ZSTD_freeDDict(ddict);
+                return NULL;
+        }   }
+        return ddict;
+    }
+}
+
+/*! ZSTD_createDDict() :
+*   Create a digested dictionary, to start decompression without startup delay.
+*   `dict` content is copied inside DDict.
+*   Consequently, `dict` can be released after `ZSTD_DDict` creation */
+ZSTD_DDict* ZSTD_createDDict(const void* dict, size_t dictSize)
+{
+    ZSTD_customMem const allocator = { NULL, NULL, NULL };
+    return ZSTD_createDDict_advanced(dict, dictSize, ZSTD_dlm_byCopy, ZSTD_dct_auto, allocator);
+}
+
+/*! ZSTD_createDDict_byReference() :
+ *  Create a digested dictionary, to start decompression without startup delay.
+ *  Dictionary content is simply referenced, it will be accessed during decompression.
+ *  Warning : dictBuffer must outlive DDict (DDict must be freed before dictBuffer) */
+ZSTD_DDict* ZSTD_createDDict_byReference(const void* dictBuffer, size_t dictSize)
+{
+    ZSTD_customMem const allocator = { NULL, NULL, NULL };
+    return ZSTD_createDDict_advanced(dictBuffer, dictSize, ZSTD_dlm_byRef, ZSTD_dct_auto, allocator);
+}
+
+
+const ZSTD_DDict* ZSTD_initStaticDDict(
+                                void* sBuffer, size_t sBufferSize,
+                                const void* dict, size_t dictSize,
+                                ZSTD_dictLoadMethod_e dictLoadMethod,
+                                ZSTD_dictContentType_e dictContentType)
+{
+    size_t const neededSpace = sizeof(ZSTD_DDict)
+                             + (dictLoadMethod == ZSTD_dlm_byRef ? 0 : dictSize);
+    ZSTD_DDict* const ddict = (ZSTD_DDict*)sBuffer;
+    assert(sBuffer != NULL);
+    assert(dict != NULL);
+    if ((size_t)sBuffer & 7) return NULL;   /* 8-aligned */
+    if (sBufferSize < neededSpace) return NULL;
+    if (dictLoadMethod == ZSTD_dlm_byCopy) {
+        memcpy(ddict+1, dict, dictSize);  /* local copy */
+        dict = ddict+1;
+    }
+    if (ZSTD_isError( ZSTD_initDDict_internal(ddict,
+                                              dict, dictSize,
+                                              ZSTD_dlm_byRef, dictContentType) ))
+        return NULL;
+    return ddict;
+}
+
+
+size_t ZSTD_freeDDict(ZSTD_DDict* ddict)
+{
+    if (ddict==NULL) return 0;   /* support free on NULL */
+    {   ZSTD_customMem const cMem = ddict->cMem;
+        ZSTD_free(ddict->dictBuffer, cMem);
+        ZSTD_free(ddict, cMem);
+        return 0;
+    }
+}
+
+/*! ZSTD_estimateDDictSize() :
+ *  Estimate amount of memory that will be needed to create a dictionary for decompression.
+ *  Note : dictionary created by reference using ZSTD_dlm_byRef are smaller */
+size_t ZSTD_estimateDDictSize(size_t dictSize, ZSTD_dictLoadMethod_e dictLoadMethod)
+{
+    return sizeof(ZSTD_DDict) + (dictLoadMethod == ZSTD_dlm_byRef ? 0 : dictSize);
+}
+
+size_t ZSTD_sizeof_DDict(const ZSTD_DDict* ddict)
+{
+    if (ddict==NULL) return 0;   /* support sizeof on NULL */
+    return sizeof(*ddict) + (ddict->dictBuffer ? ddict->dictSize : 0) ;
+}
+
+/*! ZSTD_getDictID_fromDDict() :
+ *  Provides the dictID of the dictionary loaded into `ddict`.
+ *  If @return == 0, the dictionary is not conformant to Zstandard specification, or empty.
+ *  Non-conformant dictionaries can still be loaded, but as content-only dictionaries. */
+unsigned ZSTD_getDictID_fromDDict(const ZSTD_DDict* ddict)
+{
+    if (ddict==NULL) return 0;
+    return ZSTD_getDictID_fromDict(ddict->dictContent, ddict->dictSize);
+}
diff --git a/contrib/python-zstandard/zstd/decompress/zstd_decompress.c b/contrib/python-zstandard/zstd/decompress/zstd_decompress.c
--- a/contrib/python-zstandard/zstd/decompress/zstd_decompress.c
+++ b/contrib/python-zstandard/zstd/decompress/zstd_decompress.c
@@ -37,12 +37,12 @@
  *  It's possible to set a different limit using ZSTD_DCtx_setMaxWindowSize().
  */
 #ifndef ZSTD_MAXWINDOWSIZE_DEFAULT
-#  define ZSTD_MAXWINDOWSIZE_DEFAULT (((U32)1 << ZSTD_WINDOWLOG_DEFAULTMAX) + 1)
+#  define ZSTD_MAXWINDOWSIZE_DEFAULT (((U32)1 << ZSTD_WINDOWLOG_LIMIT_DEFAULT) + 1)
 #endif
 
 /*!
  *  NO_FORWARD_PROGRESS_MAX :
- *  maximum allowed nb of calls to ZSTD_decompressStream() and ZSTD_decompress_generic()
+ *  maximum allowed nb of calls to ZSTD_decompressStream()
  *  without any forward progress
  *  (defined as: no byte read from input, and no byte flushed to output)
  *  before triggering an error.
@@ -56,128 +56,25 @@
 *  Dependencies
 *********************************************************/
 #include <string.h>      /* memcpy, memmove, memset */
-#include "compiler.h"    /* prefetch */
 #include "cpu.h"         /* bmi2 */
 #include "mem.h"         /* low level memory routines */
 #define FSE_STATIC_LINKING_ONLY
 #include "fse.h"
 #define HUF_STATIC_LINKING_ONLY
 #include "huf.h"
-#include "zstd_internal.h"
+#include "zstd_internal.h"  /* blockProperties_t */
+#include "zstd_decompress_internal.h"   /* ZSTD_DCtx */
+#include "zstd_ddict.h"  /* ZSTD_DDictDictContent */
+#include "zstd_decompress_block.h"   /* ZSTD_decompressBlock_internal */
 
 #if defined(ZSTD_LEGACY_SUPPORT) && (ZSTD_LEGACY_SUPPORT>=1)
 #  include "zstd_legacy.h"
 #endif
 
-static const void* ZSTD_DDictDictContent(const ZSTD_DDict* ddict);
-static size_t ZSTD_DDictDictSize(const ZSTD_DDict* ddict);
-
-
-/*-*************************************
-*  Errors
-***************************************/
-#define ZSTD_isError ERR_isError   /* for inlining */
-#define FSE_isError  ERR_isError
-#define HUF_isError  ERR_isError
-
-
-/*_*******************************************************
-*  Memory operations
-**********************************************************/
-static void ZSTD_copy4(void* dst, const void* src) { memcpy(dst, src, 4); }
-
 
 /*-*************************************************************
 *   Context management
 ***************************************************************/
-typedef enum { ZSTDds_getFrameHeaderSize, ZSTDds_decodeFrameHeader,
-               ZSTDds_decodeBlockHeader, ZSTDds_decompressBlock,
-               ZSTDds_decompressLastBlock, ZSTDds_checkChecksum,
-               ZSTDds_decodeSkippableHeader, ZSTDds_skipFrame } ZSTD_dStage;
-
-typedef enum { zdss_init=0, zdss_loadHeader,
-               zdss_read, zdss_load, zdss_flush } ZSTD_dStreamStage;
-
-
-typedef struct {
-    U32 fastMode;
-    U32 tableLog;
-} ZSTD_seqSymbol_header;
-
-typedef struct {
-    U16  nextState;
-    BYTE nbAdditionalBits;
-    BYTE nbBits;
-    U32  baseValue;
-} ZSTD_seqSymbol;
-
-#define SEQSYMBOL_TABLE_SIZE(log)   (1 + (1 << (log)))
-
-typedef struct {
-    ZSTD_seqSymbol LLTable[SEQSYMBOL_TABLE_SIZE(LLFSELog)];    /* Note : Space reserved for FSE Tables */
-    ZSTD_seqSymbol OFTable[SEQSYMBOL_TABLE_SIZE(OffFSELog)];   /* is also used as temporary workspace while building hufTable during DDict creation */
-    ZSTD_seqSymbol MLTable[SEQSYMBOL_TABLE_SIZE(MLFSELog)];    /* and therefore must be at least HUF_DECOMPRESS_WORKSPACE_SIZE large */
-    HUF_DTable hufTable[HUF_DTABLE_SIZE(HufLog)];  /* can accommodate HUF_decompress4X */
-    U32 rep[ZSTD_REP_NUM];
-} ZSTD_entropyDTables_t;
-
-struct ZSTD_DCtx_s
-{
-    const ZSTD_seqSymbol* LLTptr;
-    const ZSTD_seqSymbol* MLTptr;
-    const ZSTD_seqSymbol* OFTptr;
-    const HUF_DTable* HUFptr;
-    ZSTD_entropyDTables_t entropy;
-    U32 workspace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32];   /* space needed when building huffman tables */
-    const void* previousDstEnd;   /* detect continuity */
-    const void* prefixStart;      /* start of current segment */
-    const void* virtualStart;     /* virtual start of previous segment if it was just before current one */
-    const void* dictEnd;          /* end of previous segment */
-    size_t expected;
-    ZSTD_frameHeader fParams;
-    U64 decodedSize;
-    blockType_e bType;            /* used in ZSTD_decompressContinue(), store blockType between block header decoding and block decompression stages */
-    ZSTD_dStage stage;
-    U32 litEntropy;
-    U32 fseEntropy;
-    XXH64_state_t xxhState;
-    size_t headerSize;
-    ZSTD_format_e format;
-    const BYTE* litPtr;
-    ZSTD_customMem customMem;
-    size_t litSize;
-    size_t rleSize;
-    size_t staticSize;
-    int bmi2;                     /* == 1 if the CPU supports BMI2 and 0 otherwise. CPU support is determined dynamically once per context lifetime. */
-
-    /* dictionary */
-    ZSTD_DDict* ddictLocal;
-    const ZSTD_DDict* ddict;     /* set by ZSTD_initDStream_usingDDict(), or ZSTD_DCtx_refDDict() */
-    U32 dictID;
-    int ddictIsCold;             /* if == 1 : dictionary is "new" for working context, and presumed "cold" (not in cpu cache) */
-
-    /* streaming */
-    ZSTD_dStreamStage streamStage;
-    char*  inBuff;
-    size_t inBuffSize;
-    size_t inPos;
-    size_t maxWindowSize;
-    char*  outBuff;
-    size_t outBuffSize;
-    size_t outStart;
-    size_t outEnd;
-    size_t lhSize;
-    void* legacyContext;
-    U32 previousLegacyVersion;
-    U32 legacyVersion;
-    U32 hostageByte;
-    int noForwardProgress;
-
-    /* workspace */
-    BYTE litBuffer[ZSTD_BLOCKSIZE_MAX + WILDCOPY_OVERLENGTH];
-    BYTE headerBuffer[ZSTD_FRAMEHEADERSIZE_MAX];
-};  /* typedef'd to ZSTD_DCtx within "zstd.h" */
-
 size_t ZSTD_sizeof_DCtx (const ZSTD_DCtx* dctx)
 {
     if (dctx==NULL) return 0;   /* support sizeof NULL */
@@ -192,8 +89,8 @@
 static size_t ZSTD_startingInputLength(ZSTD_format_e format)
 {
     size_t const startingInputLength = (format==ZSTD_f_zstd1_magicless) ?
-                    ZSTD_frameHeaderSize_prefix - ZSTD_FRAMEIDSIZE :
-                    ZSTD_frameHeaderSize_prefix;
+                    ZSTD_FRAMEHEADERSIZE_PREFIX - ZSTD_FRAMEIDSIZE :
+                    ZSTD_FRAMEHEADERSIZE_PREFIX;
     ZSTD_STATIC_ASSERT(ZSTD_FRAMEHEADERSIZE_PREFIX >= ZSTD_FRAMEIDSIZE);
     /* only supports formats ZSTD_f_zstd1 and ZSTD_f_zstd1_magicless */
     assert( (format == ZSTD_f_zstd1) || (format == ZSTD_f_zstd1_magicless) );
@@ -290,7 +187,7 @@
     if (size < ZSTD_FRAMEIDSIZE) return 0;
     {   U32 const magic = MEM_readLE32(buffer);
         if (magic == ZSTD_MAGICNUMBER) return 1;
-        if ((magic & 0xFFFFFFF0U) == ZSTD_MAGIC_SKIPPABLE_START) return 1;
+        if ((magic & ZSTD_MAGIC_SKIPPABLE_MASK) == ZSTD_MAGIC_SKIPPABLE_START) return 1;
     }
 #if defined(ZSTD_LEGACY_SUPPORT) && (ZSTD_LEGACY_SUPPORT >= 1)
     if (ZSTD_isLegacy(buffer, size)) return 1;
@@ -345,10 +242,10 @@
 
     if ( (format != ZSTD_f_zstd1_magicless)
       && (MEM_readLE32(src) != ZSTD_MAGICNUMBER) ) {
-        if ((MEM_readLE32(src) & 0xFFFFFFF0U) == ZSTD_MAGIC_SKIPPABLE_START) {
+        if ((MEM_readLE32(src) & ZSTD_MAGIC_SKIPPABLE_MASK) == ZSTD_MAGIC_SKIPPABLE_START) {
             /* skippable frame */
-            if (srcSize < ZSTD_skippableHeaderSize)
-                return ZSTD_skippableHeaderSize; /* magic number + frame length */
+            if (srcSize < ZSTD_SKIPPABLEHEADERSIZE)
+                return ZSTD_SKIPPABLEHEADERSIZE; /* magic number + frame length */
             memset(zfhPtr, 0, sizeof(*zfhPtr));
             zfhPtr->frameContentSize = MEM_readLE32((const char *)src + ZSTD_FRAMEIDSIZE);
             zfhPtr->frameType = ZSTD_skippableFrame;
@@ -446,6 +343,21 @@
     }   }
 }
 
+static size_t readSkippableFrameSize(void const* src, size_t srcSize)
+{
+    size_t const skippableHeaderSize = ZSTD_SKIPPABLEHEADERSIZE;
+    U32 sizeU32;
+
+    if (srcSize < ZSTD_SKIPPABLEHEADERSIZE)
+        return ERROR(srcSize_wrong);
+
+    sizeU32 = MEM_readLE32((BYTE const*)src + ZSTD_FRAMEIDSIZE);
+    if ((U32)(sizeU32 + ZSTD_SKIPPABLEHEADERSIZE) < sizeU32)
+        return ERROR(frameParameter_unsupported);
+
+    return skippableHeaderSize + sizeU32;
+}
+
 /** ZSTD_findDecompressedSize() :
  *  compatible with legacy mode
  *  `srcSize` must be the exact length of some number of ZSTD compressed and/or
@@ -455,15 +367,13 @@
 {
     unsigned long long totalDstSize = 0;
 
-    while (srcSize >= ZSTD_frameHeaderSize_prefix) {
+    while (srcSize >= ZSTD_FRAMEHEADERSIZE_PREFIX) {
         U32 const magicNumber = MEM_readLE32(src);
 
-        if ((magicNumber & 0xFFFFFFF0U) == ZSTD_MAGIC_SKIPPABLE_START) {
-            size_t skippableSize;
-            if (srcSize < ZSTD_skippableHeaderSize)
-                return ERROR(srcSize_wrong);
-            skippableSize = MEM_readLE32((const BYTE *)src + ZSTD_FRAMEIDSIZE)
-                          + ZSTD_skippableHeaderSize;
+        if ((magicNumber & ZSTD_MAGIC_SKIPPABLE_MASK) == ZSTD_MAGIC_SKIPPABLE_START) {
+            size_t const skippableSize = readSkippableFrameSize(src, srcSize);
+            if (ZSTD_isError(skippableSize))
+                return skippableSize;
             if (srcSize < skippableSize) {
                 return ZSTD_CONTENTSIZE_ERROR;
             }
@@ -496,9 +406,9 @@
 }
 
 /** ZSTD_getDecompressedSize() :
-*   compatible with legacy mode
-*   @return : decompressed size if known, 0 otherwise
-              note : 0 can mean any of the following :
+ *  compatible with legacy mode
+ * @return : decompressed size if known, 0 otherwise
+             note : 0 can mean any of the following :
                    - frame content is empty
                    - decompressed size field is not present in frame header
                    - frame header unknown / not supported
@@ -512,8 +422,8 @@
 
 
 /** ZSTD_decodeFrameHeader() :
-*   `headerSize` must be the size provided by ZSTD_frameHeaderSize().
-*   @return : 0 if success, or an error code, which can be tested using ZSTD_isError() */
+ * `headerSize` must be the size provided by ZSTD_frameHeaderSize().
+ * @return : 0 if success, or an error code, which can be tested using ZSTD_isError() */
 static size_t ZSTD_decodeFrameHeader(ZSTD_DCtx* dctx, const void* src, size_t headerSize)
 {
     size_t const result = ZSTD_getFrameHeader_advanced(&(dctx->fParams), src, headerSize, dctx->format);
@@ -526,1275 +436,6 @@
 }
 
 
-/*-*************************************************************
- *   Block decoding
- ***************************************************************/
-
-/*! ZSTD_getcBlockSize() :
-*   Provides the size of compressed block from block header `src` */
-size_t ZSTD_getcBlockSize(const void* src, size_t srcSize,
-                          blockProperties_t* bpPtr)
-{
-    if (srcSize < ZSTD_blockHeaderSize) return ERROR(srcSize_wrong);
-    {   U32 const cBlockHeader = MEM_readLE24(src);
-        U32 const cSize = cBlockHeader >> 3;
-        bpPtr->lastBlock = cBlockHeader & 1;
-        bpPtr->blockType = (blockType_e)((cBlockHeader >> 1) & 3);
-        bpPtr->origSize = cSize;   /* only useful for RLE */
-        if (bpPtr->blockType == bt_rle) return 1;
-        if (bpPtr->blockType == bt_reserved) return ERROR(corruption_detected);
-        return cSize;
-    }
-}
-
-
-static size_t ZSTD_copyRawBlock(void* dst, size_t dstCapacity,
-                          const void* src, size_t srcSize)
-{
-    if (dst==NULL) return ERROR(dstSize_tooSmall);
-    if (srcSize > dstCapacity) return ERROR(dstSize_tooSmall);
-    memcpy(dst, src, srcSize);
-    return srcSize;
-}
-
-
-static size_t ZSTD_setRleBlock(void* dst, size_t dstCapacity,
-                         const void* src, size_t srcSize,
-                               size_t regenSize)
-{
-    if (srcSize != 1) return ERROR(srcSize_wrong);
-    if (regenSize > dstCapacity) return ERROR(dstSize_tooSmall);
-    memset(dst, *(const BYTE*)src, regenSize);
-    return regenSize;
-}
-
-/* Hidden declaration for fullbench */
-size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
-                          const void* src, size_t srcSize);
-/*! ZSTD_decodeLiteralsBlock() :
- * @return : nb of bytes read from src (< srcSize )
- *  note : symbol not declared but exposed for fullbench */
-size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
-                          const void* src, size_t srcSize)   /* note : srcSize < BLOCKSIZE */
-{
-    if (srcSize < MIN_CBLOCK_SIZE) return ERROR(corruption_detected);
-
-    {   const BYTE* const istart = (const BYTE*) src;
-        symbolEncodingType_e const litEncType = (symbolEncodingType_e)(istart[0] & 3);
-
-        switch(litEncType)
-        {
-        case set_repeat:
-            if (dctx->litEntropy==0) return ERROR(dictionary_corrupted);
-            /* fall-through */
-
-        case set_compressed:
-            if (srcSize < 5) return ERROR(corruption_detected);   /* srcSize >= MIN_CBLOCK_SIZE == 3; here we need up to 5 for case 3 */
-            {   size_t lhSize, litSize, litCSize;
-                U32 singleStream=0;
-                U32 const lhlCode = (istart[0] >> 2) & 3;
-                U32 const lhc = MEM_readLE32(istart);
-                switch(lhlCode)
-                {
-                case 0: case 1: default:   /* note : default is impossible, since lhlCode into [0..3] */
-                    /* 2 - 2 - 10 - 10 */
-                    singleStream = !lhlCode;
-                    lhSize = 3;
-                    litSize  = (lhc >> 4) & 0x3FF;
-                    litCSize = (lhc >> 14) & 0x3FF;
-                    break;
-                case 2:
-                    /* 2 - 2 - 14 - 14 */
-                    lhSize = 4;
-                    litSize  = (lhc >> 4) & 0x3FFF;
-                    litCSize = lhc >> 18;
-                    break;
-                case 3:
-                    /* 2 - 2 - 18 - 18 */
-                    lhSize = 5;
-                    litSize  = (lhc >> 4) & 0x3FFFF;
-                    litCSize = (lhc >> 22) + (istart[4] << 10);
-                    break;
-                }
-                if (litSize > ZSTD_BLOCKSIZE_MAX) return ERROR(corruption_detected);
-                if (litCSize + lhSize > srcSize) return ERROR(corruption_detected);
-
-                /* prefetch huffman table if cold */
-                if (dctx->ddictIsCold && (litSize > 768 /* heuristic */)) {
-                    PREFETCH_AREA(dctx->HUFptr, sizeof(dctx->entropy.hufTable));
-                }
-
-                if (HUF_isError((litEncType==set_repeat) ?
-                                    ( singleStream ?
-                                        HUF_decompress1X_usingDTable_bmi2(dctx->litBuffer, litSize, istart+lhSize, litCSize, dctx->HUFptr, dctx->bmi2) :
-                                        HUF_decompress4X_usingDTable_bmi2(dctx->litBuffer, litSize, istart+lhSize, litCSize, dctx->HUFptr, dctx->bmi2) ) :
-                                    ( singleStream ?
-                                        HUF_decompress1X1_DCtx_wksp_bmi2(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart+lhSize, litCSize,
-                                                                         dctx->workspace, sizeof(dctx->workspace), dctx->bmi2) :
-                                        HUF_decompress4X_hufOnly_wksp_bmi2(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart+lhSize, litCSize,
-                                                                           dctx->workspace, sizeof(dctx->workspace), dctx->bmi2))))
-                    return ERROR(corruption_detected);
-
-                dctx->litPtr = dctx->litBuffer;
-                dctx->litSize = litSize;
-                dctx->litEntropy = 1;
-                if (litEncType==set_compressed) dctx->HUFptr = dctx->entropy.hufTable;
-                memset(dctx->litBuffer + dctx->litSize, 0, WILDCOPY_OVERLENGTH);
-                return litCSize + lhSize;
-            }
-
-        case set_basic:
-            {   size_t litSize, lhSize;
-                U32 const lhlCode = ((istart[0]) >> 2) & 3;
-                switch(lhlCode)
-                {
-                case 0: case 2: default:   /* note : default is impossible, since lhlCode into [0..3] */
-                    lhSize = 1;
-                    litSize = istart[0] >> 3;
-                    break;
-                case 1:
-                    lhSize = 2;
-                    litSize = MEM_readLE16(istart) >> 4;
-                    break;
-                case 3:
-                    lhSize = 3;
-                    litSize = MEM_readLE24(istart) >> 4;
-                    break;
-                }
-
-                if (lhSize+litSize+WILDCOPY_OVERLENGTH > srcSize) {  /* risk reading beyond src buffer with wildcopy */
-                    if (litSize+lhSize > srcSize) return ERROR(corruption_detected);
-                    memcpy(dctx->litBuffer, istart+lhSize, litSize);
-                    dctx->litPtr = dctx->litBuffer;
-                    dctx->litSize = litSize;
-                    memset(dctx->litBuffer + dctx->litSize, 0, WILDCOPY_OVERLENGTH);
-                    return lhSize+litSize;
-                }
-                /* direct reference into compressed stream */
-                dctx->litPtr = istart+lhSize;
-                dctx->litSize = litSize;
-                return lhSize+litSize;
-            }
-
-        case set_rle:
-            {   U32 const lhlCode = ((istart[0]) >> 2) & 3;
-                size_t litSize, lhSize;
-                switch(lhlCode)
-                {
-                case 0: case 2: default:   /* note : default is impossible, since lhlCode into [0..3] */
-                    lhSize = 1;
-                    litSize = istart[0] >> 3;
-                    break;
-                case 1:
-                    lhSize = 2;
-                    litSize = MEM_readLE16(istart) >> 4;
-                    break;
-                case 3:
-                    lhSize = 3;
-                    litSize = MEM_readLE24(istart) >> 4;
-                    if (srcSize<4) return ERROR(corruption_detected);   /* srcSize >= MIN_CBLOCK_SIZE == 3; here we need lhSize+1 = 4 */
-                    break;
-                }
-                if (litSize > ZSTD_BLOCKSIZE_MAX) return ERROR(corruption_detected);
-                memset(dctx->litBuffer, istart[lhSize], litSize + WILDCOPY_OVERLENGTH);
-                dctx->litPtr = dctx->litBuffer;
-                dctx->litSize = litSize;
-                return lhSize+1;
-            }
-        default:
-            return ERROR(corruption_detected);   /* impossible */
-        }
-    }
-}
-
-/* Default FSE distribution tables.
- * These are pre-calculated FSE decoding tables using default distributions as defined in specification :
- * https://github.com/facebook/zstd/blob/master/doc/zstd_compression_format.md#default-distributions
- * They were generated programmatically with following method :
- * - start from default distributions, present in /lib/common/zstd_internal.h
- * - generate tables normally, using ZSTD_buildFSETable()
- * - printout the content of tables
- * - pretify output, report below, test with fuzzer to ensure it's correct */
-
-/* Default FSE distribution table for Literal Lengths */
-static const ZSTD_seqSymbol LL_defaultDTable[(1<<LL_DEFAULTNORMLOG)+1] = {
-     {  1,  1,  1, LL_DEFAULTNORMLOG},  /* header : fastMode, tableLog */
-     /* nextState, nbAddBits, nbBits, baseVal */
-     {  0,  0,  4,    0},  { 16,  0,  4,    0},
-     { 32,  0,  5,    1},  {  0,  0,  5,    3},
-     {  0,  0,  5,    4},  {  0,  0,  5,    6},
-     {  0,  0,  5,    7},  {  0,  0,  5,    9},
-     {  0,  0,  5,   10},  {  0,  0,  5,   12},
-     {  0,  0,  6,   14},  {  0,  1,  5,   16},
-     {  0,  1,  5,   20},  {  0,  1,  5,   22},
-     {  0,  2,  5,   28},  {  0,  3,  5,   32},
-     {  0,  4,  5,   48},  { 32,  6,  5,   64},
-     {  0,  7,  5,  128},  {  0,  8,  6,  256},
-     {  0, 10,  6, 1024},  {  0, 12,  6, 4096},
-     { 32,  0,  4,    0},  {  0,  0,  4,    1},
-     {  0,  0,  5,    2},  { 32,  0,  5,    4},
-     {  0,  0,  5,    5},  { 32,  0,  5,    7},
-     {  0,  0,  5,    8},  { 32,  0,  5,   10},
-     {  0,  0,  5,   11},  {  0,  0,  6,   13},
-     { 32,  1,  5,   16},  {  0,  1,  5,   18},
-     { 32,  1,  5,   22},  {  0,  2,  5,   24},
-     { 32,  3,  5,   32},  {  0,  3,  5,   40},
-     {  0,  6,  4,   64},  { 16,  6,  4,   64},
-     { 32,  7,  5,  128},  {  0,  9,  6,  512},
-     {  0, 11,  6, 2048},  { 48,  0,  4,    0},
-     { 16,  0,  4,    1},  { 32,  0,  5,    2},
-     { 32,  0,  5,    3},  { 32,  0,  5,    5},
-     { 32,  0,  5,    6},  { 32,  0,  5,    8},
-     { 32,  0,  5,    9},  { 32,  0,  5,   11},
-     { 32,  0,  5,   12},  {  0,  0,  6,   15},
-     { 32,  1,  5,   18},  { 32,  1,  5,   20},
-     { 32,  2,  5,   24},  { 32,  2,  5,   28},
-     { 32,  3,  5,   40},  { 32,  4,  5,   48},
-     {  0, 16,  6,65536},  {  0, 15,  6,32768},
-     {  0, 14,  6,16384},  {  0, 13,  6, 8192},
-};   /* LL_defaultDTable */
-
-/* Default FSE distribution table for Offset Codes */
-static const ZSTD_seqSymbol OF_defaultDTable[(1<<OF_DEFAULTNORMLOG)+1] = {
-    {  1,  1,  1, OF_DEFAULTNORMLOG},  /* header : fastMode, tableLog */
-    /* nextState, nbAddBits, nbBits, baseVal */
-    {  0,  0,  5,    0},     {  0,  6,  4,   61},
-    {  0,  9,  5,  509},     {  0, 15,  5,32765},
-    {  0, 21,  5,2097149},   {  0,  3,  5,    5},
-    {  0,  7,  4,  125},     {  0, 12,  5, 4093},
-    {  0, 18,  5,262141},    {  0, 23,  5,8388605},
-    {  0,  5,  5,   29},     {  0,  8,  4,  253},
-    {  0, 14,  5,16381},     {  0, 20,  5,1048573},
-    {  0,  2,  5,    1},     { 16,  7,  4,  125},
-    {  0, 11,  5, 2045},     {  0, 17,  5,131069},
-    {  0, 22,  5,4194301},   {  0,  4,  5,   13},
-    { 16,  8,  4,  253},     {  0, 13,  5, 8189},
-    {  0, 19,  5,524285},    {  0,  1,  5,    1},
-    { 16,  6,  4,   61},     {  0, 10,  5, 1021},
-    {  0, 16,  5,65533},     {  0, 28,  5,268435453},
-    {  0, 27,  5,134217725}, {  0, 26,  5,67108861},
-    {  0, 25,  5,33554429},  {  0, 24,  5,16777213},
-};   /* OF_defaultDTable */
-
-
-/* Default FSE distribution table for Match Lengths */
-static const ZSTD_seqSymbol ML_defaultDTable[(1<<ML_DEFAULTNORMLOG)+1] = {
-    {  1,  1,  1, ML_DEFAULTNORMLOG},  /* header : fastMode, tableLog */
-    /* nextState, nbAddBits, nbBits, baseVal */
-    {  0,  0,  6,    3},  {  0,  0,  4,    4},
-    { 32,  0,  5,    5},  {  0,  0,  5,    6},
-    {  0,  0,  5,    8},  {  0,  0,  5,    9},
-    {  0,  0,  5,   11},  {  0,  0,  6,   13},
-    {  0,  0,  6,   16},  {  0,  0,  6,   19},
-    {  0,  0,  6,   22},  {  0,  0,  6,   25},
-    {  0,  0,  6,   28},  {  0,  0,  6,   31},
-    {  0,  0,  6,   34},  {  0,  1,  6,   37},
-    {  0,  1,  6,   41},  {  0,  2,  6,   47},
-    {  0,  3,  6,   59},  {  0,  4,  6,   83},
-    {  0,  7,  6,  131},  {  0,  9,  6,  515},
-    { 16,  0,  4,    4},  {  0,  0,  4,    5},
-    { 32,  0,  5,    6},  {  0,  0,  5,    7},
-    { 32,  0,  5,    9},  {  0,  0,  5,   10},
-    {  0,  0,  6,   12},  {  0,  0,  6,   15},
-    {  0,  0,  6,   18},  {  0,  0,  6,   21},
-    {  0,  0,  6,   24},  {  0,  0,  6,   27},
-    {  0,  0,  6,   30},  {  0,  0,  6,   33},
-    {  0,  1,  6,   35},  {  0,  1,  6,   39},
-    {  0,  2,  6,   43},  {  0,  3,  6,   51},
-    {  0,  4,  6,   67},  {  0,  5,  6,   99},
-    {  0,  8,  6,  259},  { 32,  0,  4,    4},
-    { 48,  0,  4,    4},  { 16,  0,  4,    5},
-    { 32,  0,  5,    7},  { 32,  0,  5,    8},
-    { 32,  0,  5,   10},  { 32,  0,  5,   11},
-    {  0,  0,  6,   14},  {  0,  0,  6,   17},
-    {  0,  0,  6,   20},  {  0,  0,  6,   23},
-    {  0,  0,  6,   26},  {  0,  0,  6,   29},
-    {  0,  0,  6,   32},  {  0, 16,  6,65539},
-    {  0, 15,  6,32771},  {  0, 14,  6,16387},
-    {  0, 13,  6, 8195},  {  0, 12,  6, 4099},
-    {  0, 11,  6, 2051},  {  0, 10,  6, 1027},
-};   /* ML_defaultDTable */
-
-
-static void ZSTD_buildSeqTable_rle(ZSTD_seqSymbol* dt, U32 baseValue, U32 nbAddBits)
-{
-    void* ptr = dt;
-    ZSTD_seqSymbol_header* const DTableH = (ZSTD_seqSymbol_header*)ptr;
-    ZSTD_seqSymbol* const cell = dt + 1;
-
-    DTableH->tableLog = 0;
-    DTableH->fastMode = 0;
-
-    cell->nbBits = 0;
-    cell->nextState = 0;
-    assert(nbAddBits < 255);
-    cell->nbAdditionalBits = (BYTE)nbAddBits;
-    cell->baseValue = baseValue;
-}
-
-
-/* ZSTD_buildFSETable() :
- * generate FSE decoding table for one symbol (ll, ml or off) */
-static void
-ZSTD_buildFSETable(ZSTD_seqSymbol* dt,
-    const short* normalizedCounter, unsigned maxSymbolValue,
-    const U32* baseValue, const U32* nbAdditionalBits,
-    unsigned tableLog)
-{
-    ZSTD_seqSymbol* const tableDecode = dt+1;
-    U16 symbolNext[MaxSeq+1];
-
-    U32 const maxSV1 = maxSymbolValue + 1;
-    U32 const tableSize = 1 << tableLog;
-    U32 highThreshold = tableSize-1;
-
-    /* Sanity Checks */
-    assert(maxSymbolValue <= MaxSeq);
-    assert(tableLog <= MaxFSELog);
-
-    /* Init, lay down lowprob symbols */
-    {   ZSTD_seqSymbol_header DTableH;
-        DTableH.tableLog = tableLog;
-        DTableH.fastMode = 1;
-        {   S16 const largeLimit= (S16)(1 << (tableLog-1));
-            U32 s;
-            for (s=0; s<maxSV1; s++) {
-                if (normalizedCounter[s]==-1) {
-                    tableDecode[highThreshold--].baseValue = s;
-                    symbolNext[s] = 1;
-                } else {
-                    if (normalizedCounter[s] >= largeLimit) DTableH.fastMode=0;
-                    symbolNext[s] = normalizedCounter[s];
-        }   }   }
-        memcpy(dt, &DTableH, sizeof(DTableH));
-    }
-
-    /* Spread symbols */
-    {   U32 const tableMask = tableSize-1;
-        U32 const step = FSE_TABLESTEP(tableSize);
-        U32 s, position = 0;
-        for (s=0; s<maxSV1; s++) {
-            int i;
-            for (i=0; i<normalizedCounter[s]; i++) {
-                tableDecode[position].baseValue = s;
-                position = (position + step) & tableMask;
-                while (position > highThreshold) position = (position + step) & tableMask;   /* lowprob area */
-        }   }
-        assert(position == 0); /* position must reach all cells once, otherwise normalizedCounter is incorrect */
-    }
-
-    /* Build Decoding table */
-    {   U32 u;
-        for (u=0; u<tableSize; u++) {
-            U32 const symbol = tableDecode[u].baseValue;
-            U32 const nextState = symbolNext[symbol]++;
-            tableDecode[u].nbBits = (BYTE) (tableLog - BIT_highbit32(nextState) );
-            tableDecode[u].nextState = (U16) ( (nextState << tableDecode[u].nbBits) - tableSize);
-            assert(nbAdditionalBits[symbol] < 255);
-            tableDecode[u].nbAdditionalBits = (BYTE)nbAdditionalBits[symbol];
-            tableDecode[u].baseValue = baseValue[symbol];
-    }   }
-}
-
-
-/*! ZSTD_buildSeqTable() :
- * @return : nb bytes read from src,
- *           or an error code if it fails */
-static size_t ZSTD_buildSeqTable(ZSTD_seqSymbol* DTableSpace, const ZSTD_seqSymbol** DTablePtr,
-                                 symbolEncodingType_e type, U32 max, U32 maxLog,
-                                 const void* src, size_t srcSize,
-                                 const U32* baseValue, const U32* nbAdditionalBits,
-                                 const ZSTD_seqSymbol* defaultTable, U32 flagRepeatTable,
-                                 int ddictIsCold, int nbSeq)
-{
-    switch(type)
-    {
-    case set_rle :
-        if (!srcSize) return ERROR(srcSize_wrong);
-        if ( (*(const BYTE*)src) > max) return ERROR(corruption_detected);
-        {   U32 const symbol = *(const BYTE*)src;
-            U32 const baseline = baseValue[symbol];
-            U32 const nbBits = nbAdditionalBits[symbol];
-            ZSTD_buildSeqTable_rle(DTableSpace, baseline, nbBits);
-        }
-        *DTablePtr = DTableSpace;
-        return 1;
-    case set_basic :
-        *DTablePtr = defaultTable;
-        return 0;
-    case set_repeat:
-        if (!flagRepeatTable) return ERROR(corruption_detected);
-        /* prefetch FSE table if used */
-        if (ddictIsCold && (nbSeq > 24 /* heuristic */)) {
-            const void* const pStart = *DTablePtr;
-            size_t const pSize = sizeof(ZSTD_seqSymbol) * (SEQSYMBOL_TABLE_SIZE(maxLog));
-            PREFETCH_AREA(pStart, pSize);
-        }
-        return 0;
-    case set_compressed :
-        {   U32 tableLog;
-            S16 norm[MaxSeq+1];
-            size_t const headerSize = FSE_readNCount(norm, &max, &tableLog, src, srcSize);
-            if (FSE_isError(headerSize)) return ERROR(corruption_detected);
-            if (tableLog > maxLog) return ERROR(corruption_detected);
-            ZSTD_buildFSETable(DTableSpace, norm, max, baseValue, nbAdditionalBits, tableLog);
-            *DTablePtr = DTableSpace;
-            return headerSize;
-        }
-    default :   /* impossible */
-        assert(0);
-        return ERROR(GENERIC);
-    }
-}
-
-static const U32 LL_base[MaxLL+1] = {
-                 0,    1,    2,     3,     4,     5,     6,      7,
-                 8,    9,   10,    11,    12,    13,    14,     15,
-                16,   18,   20,    22,    24,    28,    32,     40,
-                48,   64, 0x80, 0x100, 0x200, 0x400, 0x800, 0x1000,
-                0x2000, 0x4000, 0x8000, 0x10000 };
-
-static const U32 OF_base[MaxOff+1] = {
-                 0,        1,       1,       5,     0xD,     0x1D,     0x3D,     0x7D,
-                 0xFD,   0x1FD,   0x3FD,   0x7FD,   0xFFD,   0x1FFD,   0x3FFD,   0x7FFD,
-                 0xFFFD, 0x1FFFD, 0x3FFFD, 0x7FFFD, 0xFFFFD, 0x1FFFFD, 0x3FFFFD, 0x7FFFFD,
-                 0xFFFFFD, 0x1FFFFFD, 0x3FFFFFD, 0x7FFFFFD, 0xFFFFFFD, 0x1FFFFFFD, 0x3FFFFFFD, 0x7FFFFFFD };
-
-static const U32 OF_bits[MaxOff+1] = {
-                     0,  1,  2,  3,  4,  5,  6,  7,
-                     8,  9, 10, 11, 12, 13, 14, 15,
-                    16, 17, 18, 19, 20, 21, 22, 23,
-                    24, 25, 26, 27, 28, 29, 30, 31 };
-
-static const U32 ML_base[MaxML+1] = {
-                     3,  4,  5,    6,     7,     8,     9,    10,
-                    11, 12, 13,   14,    15,    16,    17,    18,
-                    19, 20, 21,   22,    23,    24,    25,    26,
-                    27, 28, 29,   30,    31,    32,    33,    34,
-                    35, 37, 39,   41,    43,    47,    51,    59,
-                    67, 83, 99, 0x83, 0x103, 0x203, 0x403, 0x803,
-                    0x1003, 0x2003, 0x4003, 0x8003, 0x10003 };
-
-/* Hidden delcaration for fullbench */
-size_t ZSTD_decodeSeqHeaders(ZSTD_DCtx* dctx, int* nbSeqPtr,
-                             const void* src, size_t srcSize);
-
-size_t ZSTD_decodeSeqHeaders(ZSTD_DCtx* dctx, int* nbSeqPtr,
-                             const void* src, size_t srcSize)
-{
-    const BYTE* const istart = (const BYTE* const)src;
-    const BYTE* const iend = istart + srcSize;
-    const BYTE* ip = istart;
-    int nbSeq;
-    DEBUGLOG(5, "ZSTD_decodeSeqHeaders");
-
-    /* check */
-    if (srcSize < MIN_SEQUENCES_SIZE) return ERROR(srcSize_wrong);
-
-    /* SeqHead */
-    nbSeq = *ip++;
-    if (!nbSeq) { *nbSeqPtr=0; return 1; }
-    if (nbSeq > 0x7F) {
-        if (nbSeq == 0xFF) {
-            if (ip+2 > iend) return ERROR(srcSize_wrong);
-            nbSeq = MEM_readLE16(ip) + LONGNBSEQ, ip+=2;
-        } else {
-            if (ip >= iend) return ERROR(srcSize_wrong);
-            nbSeq = ((nbSeq-0x80)<<8) + *ip++;
-        }
-    }
-    *nbSeqPtr = nbSeq;
-
-    /* FSE table descriptors */
-    if (ip+4 > iend) return ERROR(srcSize_wrong); /* minimum possible size */
-    {   symbolEncodingType_e const LLtype = (symbolEncodingType_e)(*ip >> 6);
-        symbolEncodingType_e const OFtype = (symbolEncodingType_e)((*ip >> 4) & 3);
-        symbolEncodingType_e const MLtype = (symbolEncodingType_e)((*ip >> 2) & 3);
-        ip++;
-
-        /* Build DTables */
-        {   size_t const llhSize = ZSTD_buildSeqTable(dctx->entropy.LLTable, &dctx->LLTptr,
-                                                      LLtype, MaxLL, LLFSELog,
-                                                      ip, iend-ip,
-                                                      LL_base, LL_bits,
-                                                      LL_defaultDTable, dctx->fseEntropy,
-                                                      dctx->ddictIsCold, nbSeq);
-            if (ZSTD_isError(llhSize)) return ERROR(corruption_detected);
-            ip += llhSize;
-        }
-
-        {   size_t const ofhSize = ZSTD_buildSeqTable(dctx->entropy.OFTable, &dctx->OFTptr,
-                                                      OFtype, MaxOff, OffFSELog,
-                                                      ip, iend-ip,
-                                                      OF_base, OF_bits,
-                                                      OF_defaultDTable, dctx->fseEntropy,
-                                                      dctx->ddictIsCold, nbSeq);
-            if (ZSTD_isError(ofhSize)) return ERROR(corruption_detected);
-            ip += ofhSize;
-        }
-
-        {   size_t const mlhSize = ZSTD_buildSeqTable(dctx->entropy.MLTable, &dctx->MLTptr,
-                                                      MLtype, MaxML, MLFSELog,
-                                                      ip, iend-ip,
-                                                      ML_base, ML_bits,
-                                                      ML_defaultDTable, dctx->fseEntropy,
-                                                      dctx->ddictIsCold, nbSeq);
-            if (ZSTD_isError(mlhSize)) return ERROR(corruption_detected);
-            ip += mlhSize;
-        }
-    }
-
-    /* prefetch dictionary content */
-    if (dctx->ddictIsCold) {
-        size_t const dictSize = (const char*)dctx->prefixStart - (const char*)dctx->virtualStart;
-        size_t const psmin = MIN(dictSize, (size_t)(64*nbSeq) /* heuristic */ );
-        size_t const pSize = MIN(psmin, 128 KB /* protection */ );
-        const void* const pStart = (const char*)dctx->dictEnd - pSize;
-        PREFETCH_AREA(pStart, pSize);
-        dctx->ddictIsCold = 0;
-    }
-
-    return ip-istart;
-}
-
-
-typedef struct {
-    size_t litLength;
-    size_t matchLength;
-    size_t offset;
-    const BYTE* match;
-} seq_t;
-
-typedef struct {
-    size_t state;
-    const ZSTD_seqSymbol* table;
-} ZSTD_fseState;
-
-typedef struct {
-    BIT_DStream_t DStream;
-    ZSTD_fseState stateLL;
-    ZSTD_fseState stateOffb;
-    ZSTD_fseState stateML;
-    size_t prevOffset[ZSTD_REP_NUM];
-    const BYTE* prefixStart;
-    const BYTE* dictEnd;
-    size_t pos;
-} seqState_t;
-
-
-FORCE_NOINLINE
-size_t ZSTD_execSequenceLast7(BYTE* op,
-                              BYTE* const oend, seq_t sequence,
-                              const BYTE** litPtr, const BYTE* const litLimit,
-                              const BYTE* const base, const BYTE* const vBase, const BYTE* const dictEnd)
-{
-    BYTE* const oLitEnd = op + sequence.litLength;
-    size_t const sequenceLength = sequence.litLength + sequence.matchLength;
-    BYTE* const oMatchEnd = op + sequenceLength;   /* risk : address space overflow (32-bits) */
-    BYTE* const oend_w = oend - WILDCOPY_OVERLENGTH;
-    const BYTE* const iLitEnd = *litPtr + sequence.litLength;
-    const BYTE* match = oLitEnd - sequence.offset;
-
-    /* check */
-    if (oMatchEnd>oend) return ERROR(dstSize_tooSmall); /* last match must start at a minimum distance of WILDCOPY_OVERLENGTH from oend */
-    if (iLitEnd > litLimit) return ERROR(corruption_detected);   /* over-read beyond lit buffer */
-    if (oLitEnd <= oend_w) return ERROR(GENERIC);   /* Precondition */
-
-    /* copy literals */
-    if (op < oend_w) {
-        ZSTD_wildcopy(op, *litPtr, oend_w - op);
-        *litPtr += oend_w - op;
-        op = oend_w;
-    }
-    while (op < oLitEnd) *op++ = *(*litPtr)++;
-
-    /* copy Match */
-    if (sequence.offset > (size_t)(oLitEnd - base)) {
-        /* offset beyond prefix */
-        if (sequence.offset > (size_t)(oLitEnd - vBase)) return ERROR(corruption_detected);
-        match = dictEnd - (base-match);
-        if (match + sequence.matchLength <= dictEnd) {
-            memmove(oLitEnd, match, sequence.matchLength);
-            return sequenceLength;
-        }
-        /* span extDict & currentPrefixSegment */
-        {   size_t const length1 = dictEnd - match;
-            memmove(oLitEnd, match, length1);
-            op = oLitEnd + length1;
-            sequence.matchLength -= length1;
-            match = base;
-    }   }
-    while (op < oMatchEnd) *op++ = *match++;
-    return sequenceLength;
-}
-
-
-HINT_INLINE
-size_t ZSTD_execSequence(BYTE* op,
-                         BYTE* const oend, seq_t sequence,
-                         const BYTE** litPtr, const BYTE* const litLimit,
-                         const BYTE* const prefixStart, const BYTE* const virtualStart, const BYTE* const dictEnd)
-{
-    BYTE* const oLitEnd = op + sequence.litLength;
-    size_t const sequenceLength = sequence.litLength + sequence.matchLength;
-    BYTE* const oMatchEnd = op + sequenceLength;   /* risk : address space overflow (32-bits) */
-    BYTE* const oend_w = oend - WILDCOPY_OVERLENGTH;
-    const BYTE* const iLitEnd = *litPtr + sequence.litLength;
-    const BYTE* match = oLitEnd - sequence.offset;
-
-    /* check */
-    if (oMatchEnd>oend) return ERROR(dstSize_tooSmall); /* last match must start at a minimum distance of WILDCOPY_OVERLENGTH from oend */
-    if (iLitEnd > litLimit) return ERROR(corruption_detected);   /* over-read beyond lit buffer */
-    if (oLitEnd>oend_w) return ZSTD_execSequenceLast7(op, oend, sequence, litPtr, litLimit, prefixStart, virtualStart, dictEnd);
-
-    /* copy Literals */
-    ZSTD_copy8(op, *litPtr);
-    if (sequence.litLength > 8)
-        ZSTD_wildcopy(op+8, (*litPtr)+8, sequence.litLength - 8);   /* note : since oLitEnd <= oend-WILDCOPY_OVERLENGTH, no risk of overwrite beyond oend */
-    op = oLitEnd;
-    *litPtr = iLitEnd;   /* update for next sequence */
-
-    /* copy Match */
-    if (sequence.offset > (size_t)(oLitEnd - prefixStart)) {
-        /* offset beyond prefix -> go into extDict */
-        if (sequence.offset > (size_t)(oLitEnd - virtualStart))
-            return ERROR(corruption_detected);
-        match = dictEnd + (match - prefixStart);
-        if (match + sequence.matchLength <= dictEnd) {
-            memmove(oLitEnd, match, sequence.matchLength);
-            return sequenceLength;
-        }
-        /* span extDict & currentPrefixSegment */
-        {   size_t const length1 = dictEnd - match;
-            memmove(oLitEnd, match, length1);
-            op = oLitEnd + length1;
-            sequence.matchLength -= length1;
-            match = prefixStart;
-            if (op > oend_w || sequence.matchLength < MINMATCH) {
-              U32 i;
-              for (i = 0; i < sequence.matchLength; ++i) op[i] = match[i];
-              return sequenceLength;
-            }
-    }   }
-    /* Requirement: op <= oend_w && sequence.matchLength >= MINMATCH */
-
-    /* match within prefix */
-    if (sequence.offset < 8) {
-        /* close range match, overlap */
-        static const U32 dec32table[] = { 0, 1, 2, 1, 4, 4, 4, 4 };   /* added */
-        static const int dec64table[] = { 8, 8, 8, 7, 8, 9,10,11 };   /* subtracted */
-        int const sub2 = dec64table[sequence.offset];
-        op[0] = match[0];
-        op[1] = match[1];
-        op[2] = match[2];
-        op[3] = match[3];
-        match += dec32table[sequence.offset];
-        ZSTD_copy4(op+4, match);
-        match -= sub2;
-    } else {
-        ZSTD_copy8(op, match);
-    }
-    op += 8; match += 8;
-
-    if (oMatchEnd > oend-(16-MINMATCH)) {
-        if (op < oend_w) {
-            ZSTD_wildcopy(op, match, oend_w - op);
-            match += oend_w - op;
-            op = oend_w;
-        }
-        while (op < oMatchEnd) *op++ = *match++;
-    } else {
-        ZSTD_wildcopy(op, match, (ptrdiff_t)sequence.matchLength-8);   /* works even if matchLength < 8 */
-    }
-    return sequenceLength;
-}
-
-
-HINT_INLINE
-size_t ZSTD_execSequenceLong(BYTE* op,
-                             BYTE* const oend, seq_t sequence,
-                             const BYTE** litPtr, const BYTE* const litLimit,
-                             const BYTE* const prefixStart, const BYTE* const dictStart, const BYTE* const dictEnd)
-{
-    BYTE* const oLitEnd = op + sequence.litLength;
-    size_t const sequenceLength = sequence.litLength + sequence.matchLength;
-    BYTE* const oMatchEnd = op + sequenceLength;   /* risk : address space overflow (32-bits) */
-    BYTE* const oend_w = oend - WILDCOPY_OVERLENGTH;
-    const BYTE* const iLitEnd = *litPtr + sequence.litLength;
-    const BYTE* match = sequence.match;
-
-    /* check */
-    if (oMatchEnd > oend) return ERROR(dstSize_tooSmall); /* last match must start at a minimum distance of WILDCOPY_OVERLENGTH from oend */
-    if (iLitEnd > litLimit) return ERROR(corruption_detected);   /* over-read beyond lit buffer */
-    if (oLitEnd > oend_w) return ZSTD_execSequenceLast7(op, oend, sequence, litPtr, litLimit, prefixStart, dictStart, dictEnd);
-
-    /* copy Literals */
-    ZSTD_copy8(op, *litPtr);  /* note : op <= oLitEnd <= oend_w == oend - 8 */
-    if (sequence.litLength > 8)
-        ZSTD_wildcopy(op+8, (*litPtr)+8, sequence.litLength - 8);   /* note : since oLitEnd <= oend-WILDCOPY_OVERLENGTH, no risk of overwrite beyond oend */
-    op = oLitEnd;
-    *litPtr = iLitEnd;   /* update for next sequence */
-
-    /* copy Match */
-    if (sequence.offset > (size_t)(oLitEnd - prefixStart)) {
-        /* offset beyond prefix */
-        if (sequence.offset > (size_t)(oLitEnd - dictStart)) return ERROR(corruption_detected);
-        if (match + sequence.matchLength <= dictEnd) {
-            memmove(oLitEnd, match, sequence.matchLength);
-            return sequenceLength;
-        }
-        /* span extDict & currentPrefixSegment */
-        {   size_t const length1 = dictEnd - match;
-            memmove(oLitEnd, match, length1);
-            op = oLitEnd + length1;
-            sequence.matchLength -= length1;
-            match = prefixStart;
-            if (op > oend_w || sequence.matchLength < MINMATCH) {
-              U32 i;
-              for (i = 0; i < sequence.matchLength; ++i) op[i] = match[i];
-              return sequenceLength;
-            }
-    }   }
-    assert(op <= oend_w);
-    assert(sequence.matchLength >= MINMATCH);
-
-    /* match within prefix */
-    if (sequence.offset < 8) {
-        /* close range match, overlap */
-        static const U32 dec32table[] = { 0, 1, 2, 1, 4, 4, 4, 4 };   /* added */
-        static const int dec64table[] = { 8, 8, 8, 7, 8, 9,10,11 };   /* subtracted */
-        int const sub2 = dec64table[sequence.offset];
-        op[0] = match[0];
-        op[1] = match[1];
-        op[2] = match[2];
-        op[3] = match[3];
-        match += dec32table[sequence.offset];
-        ZSTD_copy4(op+4, match);
-        match -= sub2;
-    } else {
-        ZSTD_copy8(op, match);
-    }
-    op += 8; match += 8;
-
-    if (oMatchEnd > oend-(16-MINMATCH)) {
-        if (op < oend_w) {
-            ZSTD_wildcopy(op, match, oend_w - op);
-            match += oend_w - op;
-            op = oend_w;
-        }
-        while (op < oMatchEnd) *op++ = *match++;
-    } else {
-        ZSTD_wildcopy(op, match, (ptrdiff_t)sequence.matchLength-8);   /* works even if matchLength < 8 */
-    }
-    return sequenceLength;
-}
-
-static void
-ZSTD_initFseState(ZSTD_fseState* DStatePtr, BIT_DStream_t* bitD, const ZSTD_seqSymbol* dt)
-{
-    const void* ptr = dt;
-    const ZSTD_seqSymbol_header* const DTableH = (const ZSTD_seqSymbol_header*)ptr;
-    DStatePtr->state = BIT_readBits(bitD, DTableH->tableLog);
-    DEBUGLOG(6, "ZSTD_initFseState : val=%u using %u bits",
-                (U32)DStatePtr->state, DTableH->tableLog);
-    BIT_reloadDStream(bitD);
-    DStatePtr->table = dt + 1;
-}
-
-FORCE_INLINE_TEMPLATE void
-ZSTD_updateFseState(ZSTD_fseState* DStatePtr, BIT_DStream_t* bitD)
-{
-    ZSTD_seqSymbol const DInfo = DStatePtr->table[DStatePtr->state];
-    U32 const nbBits = DInfo.nbBits;
-    size_t const lowBits = BIT_readBits(bitD, nbBits);
-    DStatePtr->state = DInfo.nextState + lowBits;
-}
-
-/* We need to add at most (ZSTD_WINDOWLOG_MAX_32 - 1) bits to read the maximum
- * offset bits. But we can only read at most (STREAM_ACCUMULATOR_MIN_32 - 1)
- * bits before reloading. This value is the maximum number of bytes we read
- * after reloading when we are decoding long offets.
- */
-#define LONG_OFFSETS_MAX_EXTRA_BITS_32                       \
-    (ZSTD_WINDOWLOG_MAX_32 > STREAM_ACCUMULATOR_MIN_32       \
-        ? ZSTD_WINDOWLOG_MAX_32 - STREAM_ACCUMULATOR_MIN_32  \
-        : 0)
-
-typedef enum { ZSTD_lo_isRegularOffset, ZSTD_lo_isLongOffset=1 } ZSTD_longOffset_e;
-
-FORCE_INLINE_TEMPLATE seq_t
-ZSTD_decodeSequence(seqState_t* seqState, const ZSTD_longOffset_e longOffsets)
-{
-    seq_t seq;
-    U32 const llBits = seqState->stateLL.table[seqState->stateLL.state].nbAdditionalBits;
-    U32 const mlBits = seqState->stateML.table[seqState->stateML.state].nbAdditionalBits;
-    U32 const ofBits = seqState->stateOffb.table[seqState->stateOffb.state].nbAdditionalBits;
-    U32 const totalBits = llBits+mlBits+ofBits;
-    U32 const llBase = seqState->stateLL.table[seqState->stateLL.state].baseValue;
-    U32 const mlBase = seqState->stateML.table[seqState->stateML.state].baseValue;
-    U32 const ofBase = seqState->stateOffb.table[seqState->stateOffb.state].baseValue;
-
-    /* sequence */
-    {   size_t offset;
-        if (!ofBits)
-            offset = 0;
-        else {
-            ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);
-            ZSTD_STATIC_ASSERT(LONG_OFFSETS_MAX_EXTRA_BITS_32 == 5);
-            assert(ofBits <= MaxOff);
-            if (MEM_32bits() && longOffsets && (ofBits >= STREAM_ACCUMULATOR_MIN_32)) {
-                U32 const extraBits = ofBits - MIN(ofBits, 32 - seqState->DStream.bitsConsumed);
-                offset = ofBase + (BIT_readBitsFast(&seqState->DStream, ofBits - extraBits) << extraBits);
-                BIT_reloadDStream(&seqState->DStream);
-                if (extraBits) offset += BIT_readBitsFast(&seqState->DStream, extraBits);
-                assert(extraBits <= LONG_OFFSETS_MAX_EXTRA_BITS_32);   /* to avoid another reload */
-            } else {
-                offset = ofBase + BIT_readBitsFast(&seqState->DStream, ofBits/*>0*/);   /* <=  (ZSTD_WINDOWLOG_MAX-1) bits */
-                if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream);
-            }
-        }
-
-        if (ofBits <= 1) {
-            offset += (llBase==0);
-            if (offset) {
-                size_t temp = (offset==3) ? seqState->prevOffset[0] - 1 : seqState->prevOffset[offset];
-                temp += !temp;   /* 0 is not valid; input is corrupted; force offset to 1 */
-                if (offset != 1) seqState->prevOffset[2] = seqState->prevOffset[1];
-                seqState->prevOffset[1] = seqState->prevOffset[0];
-                seqState->prevOffset[0] = offset = temp;
-            } else {  /* offset == 0 */
-                offset = seqState->prevOffset[0];
-            }
-        } else {
-            seqState->prevOffset[2] = seqState->prevOffset[1];
-            seqState->prevOffset[1] = seqState->prevOffset[0];
-            seqState->prevOffset[0] = offset;
-        }
-        seq.offset = offset;
-    }
-
-    seq.matchLength = mlBase
-                    + ((mlBits>0) ? BIT_readBitsFast(&seqState->DStream, mlBits/*>0*/) : 0);  /* <=  16 bits */
-    if (MEM_32bits() && (mlBits+llBits >= STREAM_ACCUMULATOR_MIN_32-LONG_OFFSETS_MAX_EXTRA_BITS_32))
-        BIT_reloadDStream(&seqState->DStream);
-    if (MEM_64bits() && (totalBits >= STREAM_ACCUMULATOR_MIN_64-(LLFSELog+MLFSELog+OffFSELog)))
-        BIT_reloadDStream(&seqState->DStream);
-    /* Ensure there are enough bits to read the rest of data in 64-bit mode. */
-    ZSTD_STATIC_ASSERT(16+LLFSELog+MLFSELog+OffFSELog < STREAM_ACCUMULATOR_MIN_64);
-
-    seq.litLength = llBase
-                  + ((llBits>0) ? BIT_readBitsFast(&seqState->DStream, llBits/*>0*/) : 0);    /* <=  16 bits */
-    if (MEM_32bits())
-        BIT_reloadDStream(&seqState->DStream);
-
-    DEBUGLOG(6, "seq: litL=%u, matchL=%u, offset=%u",
-                (U32)seq.litLength, (U32)seq.matchLength, (U32)seq.offset);
-
-    /* ANS state update */
-    ZSTD_updateFseState(&seqState->stateLL, &seqState->DStream);    /* <=  9 bits */
-    ZSTD_updateFseState(&seqState->stateML, &seqState->DStream);    /* <=  9 bits */
-    if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream);    /* <= 18 bits */
-    ZSTD_updateFseState(&seqState->stateOffb, &seqState->DStream);  /* <=  8 bits */
-
-    return seq;
-}
-
-FORCE_INLINE_TEMPLATE size_t
-ZSTD_decompressSequences_body( ZSTD_DCtx* dctx,
-                               void* dst, size_t maxDstSize,
-                         const void* seqStart, size_t seqSize, int nbSeq,
-                         const ZSTD_longOffset_e isLongOffset)
-{
-    const BYTE* ip = (const BYTE*)seqStart;
-    const BYTE* const iend = ip + seqSize;
-    BYTE* const ostart = (BYTE* const)dst;
-    BYTE* const oend = ostart + maxDstSize;
-    BYTE* op = ostart;
-    const BYTE* litPtr = dctx->litPtr;
-    const BYTE* const litEnd = litPtr + dctx->litSize;
-    const BYTE* const prefixStart = (const BYTE*) (dctx->prefixStart);
-    const BYTE* const vBase = (const BYTE*) (dctx->virtualStart);
-    const BYTE* const dictEnd = (const BYTE*) (dctx->dictEnd);
-    DEBUGLOG(5, "ZSTD_decompressSequences_body");
-
-    /* Regen sequences */
-    if (nbSeq) {
-        seqState_t seqState;
-        dctx->fseEntropy = 1;
-        { U32 i; for (i=0; i<ZSTD_REP_NUM; i++) seqState.prevOffset[i] = dctx->entropy.rep[i]; }
-        CHECK_E(BIT_initDStream(&seqState.DStream, ip, iend-ip), corruption_detected);
-        ZSTD_initFseState(&seqState.stateLL, &seqState.DStream, dctx->LLTptr);
-        ZSTD_initFseState(&seqState.stateOffb, &seqState.DStream, dctx->OFTptr);
-        ZSTD_initFseState(&seqState.stateML, &seqState.DStream, dctx->MLTptr);
-
-        for ( ; (BIT_reloadDStream(&(seqState.DStream)) <= BIT_DStream_completed) && nbSeq ; ) {
-            nbSeq--;
-            {   seq_t const sequence = ZSTD_decodeSequence(&seqState, isLongOffset);
-                size_t const oneSeqSize = ZSTD_execSequence(op, oend, sequence, &litPtr, litEnd, prefixStart, vBase, dictEnd);
-                DEBUGLOG(6, "regenerated sequence size : %u", (U32)oneSeqSize);
-                if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
-                op += oneSeqSize;
-        }   }
-
-        /* check if reached exact end */
-        DEBUGLOG(5, "ZSTD_decompressSequences_body: after decode loop, remaining nbSeq : %i", nbSeq);
-        if (nbSeq) return ERROR(corruption_detected);
-        /* save reps for next block */
-        { U32 i; for (i=0; i<ZSTD_REP_NUM; i++) dctx->entropy.rep[i] = (U32)(seqState.prevOffset[i]); }
-    }
-
-    /* last literal segment */
-    {   size_t const lastLLSize = litEnd - litPtr;
-        if (lastLLSize > (size_t)(oend-op)) return ERROR(dstSize_tooSmall);
-        memcpy(op, litPtr, lastLLSize);
-        op += lastLLSize;
-    }
-
-    return op-ostart;
-}
-
-static size_t
-ZSTD_decompressSequences_default(ZSTD_DCtx* dctx,
-                                 void* dst, size_t maxDstSize,
-                           const void* seqStart, size_t seqSize, int nbSeq,
-                           const ZSTD_longOffset_e isLongOffset)
-{
-    return ZSTD_decompressSequences_body(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
-}
-
-
-
-FORCE_INLINE_TEMPLATE seq_t
-ZSTD_decodeSequenceLong(seqState_t* seqState, ZSTD_longOffset_e const longOffsets)
-{
-    seq_t seq;
-    U32 const llBits = seqState->stateLL.table[seqState->stateLL.state].nbAdditionalBits;
-    U32 const mlBits = seqState->stateML.table[seqState->stateML.state].nbAdditionalBits;
-    U32 const ofBits = seqState->stateOffb.table[seqState->stateOffb.state].nbAdditionalBits;
-    U32 const totalBits = llBits+mlBits+ofBits;
-    U32 const llBase = seqState->stateLL.table[seqState->stateLL.state].baseValue;
-    U32 const mlBase = seqState->stateML.table[seqState->stateML.state].baseValue;
-    U32 const ofBase = seqState->stateOffb.table[seqState->stateOffb.state].baseValue;
-
-    /* sequence */
-    {   size_t offset;
-        if (!ofBits)
-            offset = 0;
-        else {
-            ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);
-            ZSTD_STATIC_ASSERT(LONG_OFFSETS_MAX_EXTRA_BITS_32 == 5);
-            assert(ofBits <= MaxOff);
-            if (MEM_32bits() && longOffsets) {
-                U32 const extraBits = ofBits - MIN(ofBits, STREAM_ACCUMULATOR_MIN_32-1);
-                offset = ofBase + (BIT_readBitsFast(&seqState->DStream, ofBits - extraBits) << extraBits);
-                if (MEM_32bits() || extraBits) BIT_reloadDStream(&seqState->DStream);
-                if (extraBits) offset += BIT_readBitsFast(&seqState->DStream, extraBits);
-            } else {
-                offset = ofBase + BIT_readBitsFast(&seqState->DStream, ofBits);   /* <=  (ZSTD_WINDOWLOG_MAX-1) bits */
-                if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream);
-            }
-        }
-
-        if (ofBits <= 1) {
-            offset += (llBase==0);
-            if (offset) {
-                size_t temp = (offset==3) ? seqState->prevOffset[0] - 1 : seqState->prevOffset[offset];
-                temp += !temp;   /* 0 is not valid; input is corrupted; force offset to 1 */
-                if (offset != 1) seqState->prevOffset[2] = seqState->prevOffset[1];
-                seqState->prevOffset[1] = seqState->prevOffset[0];
-                seqState->prevOffset[0] = offset = temp;
-            } else {
-                offset = seqState->prevOffset[0];
-            }
-        } else {
-            seqState->prevOffset[2] = seqState->prevOffset[1];
-            seqState->prevOffset[1] = seqState->prevOffset[0];
-            seqState->prevOffset[0] = offset;
-        }
-        seq.offset = offset;
-    }
-
-    seq.matchLength = mlBase + ((mlBits>0) ? BIT_readBitsFast(&seqState->DStream, mlBits) : 0);  /* <=  16 bits */
-    if (MEM_32bits() && (mlBits+llBits >= STREAM_ACCUMULATOR_MIN_32-LONG_OFFSETS_MAX_EXTRA_BITS_32))
-        BIT_reloadDStream(&seqState->DStream);
-    if (MEM_64bits() && (totalBits >= STREAM_ACCUMULATOR_MIN_64-(LLFSELog+MLFSELog+OffFSELog)))
-        BIT_reloadDStream(&seqState->DStream);
-    /* Verify that there is enough bits to read the rest of the data in 64-bit mode. */
-    ZSTD_STATIC_ASSERT(16+LLFSELog+MLFSELog+OffFSELog < STREAM_ACCUMULATOR_MIN_64);
-
-    seq.litLength = llBase + ((llBits>0) ? BIT_readBitsFast(&seqState->DStream, llBits) : 0);    /* <=  16 bits */
-    if (MEM_32bits())
-        BIT_reloadDStream(&seqState->DStream);
-
-    {   size_t const pos = seqState->pos + seq.litLength;
-        const BYTE* const matchBase = (seq.offset > pos) ? seqState->dictEnd : seqState->prefixStart;
-        seq.match = matchBase + pos - seq.offset;  /* note : this operation can overflow when seq.offset is really too large, which can only happen when input is corrupted.
-                                                    * No consequence though : no memory access will occur, overly large offset will be detected in ZSTD_execSequenceLong() */
-        seqState->pos = pos + seq.matchLength;
-    }
-
-    /* ANS state update */
-    ZSTD_updateFseState(&seqState->stateLL, &seqState->DStream);    /* <=  9 bits */
-    ZSTD_updateFseState(&seqState->stateML, &seqState->DStream);    /* <=  9 bits */
-    if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream);    /* <= 18 bits */
-    ZSTD_updateFseState(&seqState->stateOffb, &seqState->DStream);  /* <=  8 bits */
-
-    return seq;
-}
-
-FORCE_INLINE_TEMPLATE size_t
-ZSTD_decompressSequencesLong_body(
-                               ZSTD_DCtx* dctx,
-                               void* dst, size_t maxDstSize,
-                         const void* seqStart, size_t seqSize, int nbSeq,
-                         const ZSTD_longOffset_e isLongOffset)
-{
-    const BYTE* ip = (const BYTE*)seqStart;
-    const BYTE* const iend = ip + seqSize;
-    BYTE* const ostart = (BYTE* const)dst;
-    BYTE* const oend = ostart + maxDstSize;
-    BYTE* op = ostart;
-    const BYTE* litPtr = dctx->litPtr;
-    const BYTE* const litEnd = litPtr + dctx->litSize;
-    const BYTE* const prefixStart = (const BYTE*) (dctx->prefixStart);
-    const BYTE* const dictStart = (const BYTE*) (dctx->virtualStart);
-    const BYTE* const dictEnd = (const BYTE*) (dctx->dictEnd);
-
-    /* Regen sequences */
-    if (nbSeq) {
-#define STORED_SEQS 4
-#define STOSEQ_MASK (STORED_SEQS-1)
-#define ADVANCED_SEQS 4
-        seq_t sequences[STORED_SEQS];
-        int const seqAdvance = MIN(nbSeq, ADVANCED_SEQS);
-        seqState_t seqState;
-        int seqNb;
-        dctx->fseEntropy = 1;
-        { U32 i; for (i=0; i<ZSTD_REP_NUM; i++) seqState.prevOffset[i] = dctx->entropy.rep[i]; }
-        seqState.prefixStart = prefixStart;
-        seqState.pos = (size_t)(op-prefixStart);
-        seqState.dictEnd = dictEnd;
-        CHECK_E(BIT_initDStream(&seqState.DStream, ip, iend-ip), corruption_detected);
-        ZSTD_initFseState(&seqState.stateLL, &seqState.DStream, dctx->LLTptr);
-        ZSTD_initFseState(&seqState.stateOffb, &seqState.DStream, dctx->OFTptr);
-        ZSTD_initFseState(&seqState.stateML, &seqState.DStream, dctx->MLTptr);
-
-        /* prepare in advance */
-        for (seqNb=0; (BIT_reloadDStream(&seqState.DStream) <= BIT_DStream_completed) && (seqNb<seqAdvance); seqNb++) {
-            sequences[seqNb] = ZSTD_decodeSequenceLong(&seqState, isLongOffset);
-        }
-        if (seqNb<seqAdvance) return ERROR(corruption_detected);
-
-        /* decode and decompress */
-        for ( ; (BIT_reloadDStream(&(seqState.DStream)) <= BIT_DStream_completed) && (seqNb<nbSeq) ; seqNb++) {
-            seq_t const sequence = ZSTD_decodeSequenceLong(&seqState, isLongOffset);
-            size_t const oneSeqSize = ZSTD_execSequenceLong(op, oend, sequences[(seqNb-ADVANCED_SEQS) & STOSEQ_MASK], &litPtr, litEnd, prefixStart, dictStart, dictEnd);
-            if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
-            PREFETCH(sequence.match);  /* note : it's safe to invoke PREFETCH() on any memory address, including invalid ones */
-            sequences[seqNb&STOSEQ_MASK] = sequence;
-            op += oneSeqSize;
-        }
-        if (seqNb<nbSeq) return ERROR(corruption_detected);
-
-        /* finish queue */
-        seqNb -= seqAdvance;
-        for ( ; seqNb<nbSeq ; seqNb++) {
-            size_t const oneSeqSize = ZSTD_execSequenceLong(op, oend, sequences[seqNb&STOSEQ_MASK], &litPtr, litEnd, prefixStart, dictStart, dictEnd);
-            if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
-            op += oneSeqSize;
-        }
-
-        /* save reps for next block */
-        { U32 i; for (i=0; i<ZSTD_REP_NUM; i++) dctx->entropy.rep[i] = (U32)(seqState.prevOffset[i]); }
-#undef STORED_SEQS
-#undef STOSEQ_MASK
-#undef ADVANCED_SEQS
-    }
-
-    /* last literal segment */
-    {   size_t const lastLLSize = litEnd - litPtr;
-        if (lastLLSize > (size_t)(oend-op)) return ERROR(dstSize_tooSmall);
-        memcpy(op, litPtr, lastLLSize);
-        op += lastLLSize;
-    }
-
-    return op-ostart;
-}
-
-static size_t
-ZSTD_decompressSequencesLong_default(ZSTD_DCtx* dctx,
-                                 void* dst, size_t maxDstSize,
-                           const void* seqStart, size_t seqSize, int nbSeq,
-                           const ZSTD_longOffset_e isLongOffset)
-{
-    return ZSTD_decompressSequencesLong_body(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
-}
-
-
-
-#if DYNAMIC_BMI2
-
-static TARGET_ATTRIBUTE("bmi2") size_t
-ZSTD_decompressSequences_bmi2(ZSTD_DCtx* dctx,
-                                 void* dst, size_t maxDstSize,
-                           const void* seqStart, size_t seqSize, int nbSeq,
-                           const ZSTD_longOffset_e isLongOffset)
-{
-    return ZSTD_decompressSequences_body(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
-}
-
-static TARGET_ATTRIBUTE("bmi2") size_t
-ZSTD_decompressSequencesLong_bmi2(ZSTD_DCtx* dctx,
-                                 void* dst, size_t maxDstSize,
-                           const void* seqStart, size_t seqSize, int nbSeq,
-                           const ZSTD_longOffset_e isLongOffset)
-{
-    return ZSTD_decompressSequencesLong_body(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
-}
-
-#endif
-
-typedef size_t (*ZSTD_decompressSequences_t)(
-    ZSTD_DCtx *dctx, void *dst, size_t maxDstSize,
-    const void *seqStart, size_t seqSize, int nbSeq,
-    const ZSTD_longOffset_e isLongOffset);
-
-static size_t ZSTD_decompressSequences(ZSTD_DCtx* dctx, void* dst, size_t maxDstSize,
-                                const void* seqStart, size_t seqSize, int nbSeq,
-                                const ZSTD_longOffset_e isLongOffset)
-{
-    DEBUGLOG(5, "ZSTD_decompressSequences");
-#if DYNAMIC_BMI2
-    if (dctx->bmi2) {
-        return ZSTD_decompressSequences_bmi2(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
-    }
-#endif
-  return ZSTD_decompressSequences_default(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
-}
-
-static size_t ZSTD_decompressSequencesLong(ZSTD_DCtx* dctx,
-                                void* dst, size_t maxDstSize,
-                                const void* seqStart, size_t seqSize, int nbSeq,
-                                const ZSTD_longOffset_e isLongOffset)
-{
-    DEBUGLOG(5, "ZSTD_decompressSequencesLong");
-#if DYNAMIC_BMI2
-    if (dctx->bmi2) {
-        return ZSTD_decompressSequencesLong_bmi2(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
-    }
-#endif
-  return ZSTD_decompressSequencesLong_default(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
-}
-
-/* ZSTD_getLongOffsetsShare() :
- * condition : offTable must be valid
- * @return : "share" of long offsets (arbitrarily defined as > (1<<23))
- *           compared to maximum possible of (1<<OffFSELog) */
-static unsigned
-ZSTD_getLongOffsetsShare(const ZSTD_seqSymbol* offTable)
-{
-    const void* ptr = offTable;
-    U32 const tableLog = ((const ZSTD_seqSymbol_header*)ptr)[0].tableLog;
-    const ZSTD_seqSymbol* table = offTable + 1;
-    U32 const max = 1 << tableLog;
-    U32 u, total = 0;
-    DEBUGLOG(5, "ZSTD_getLongOffsetsShare: (tableLog=%u)", tableLog);
-
-    assert(max <= (1 << OffFSELog));  /* max not too large */
-    for (u=0; u<max; u++) {
-        if (table[u].nbAdditionalBits > 22) total += 1;
-    }
-
-    assert(tableLog <= OffFSELog);
-    total <<= (OffFSELog - tableLog);  /* scale to OffFSELog */
-
-    return total;
-}
-
-
-static size_t ZSTD_decompressBlock_internal(ZSTD_DCtx* dctx,
-                            void* dst, size_t dstCapacity,
-                      const void* src, size_t srcSize, const int frame)
-{   /* blockType == blockCompressed */
-    const BYTE* ip = (const BYTE*)src;
-    /* isLongOffset must be true if there are long offsets.
-     * Offsets are long if they are larger than 2^STREAM_ACCUMULATOR_MIN.
-     * We don't expect that to be the case in 64-bit mode.
-     * In block mode, window size is not known, so we have to be conservative.
-     * (note: but it could be evaluated from current-lowLimit)
-     */
-    ZSTD_longOffset_e const isLongOffset = (ZSTD_longOffset_e)(MEM_32bits() && (!frame || dctx->fParams.windowSize > (1ULL << STREAM_ACCUMULATOR_MIN)));
-    DEBUGLOG(5, "ZSTD_decompressBlock_internal (size : %u)", (U32)srcSize);
-
-    if (srcSize >= ZSTD_BLOCKSIZE_MAX) return ERROR(srcSize_wrong);
-
-    /* Decode literals section */
-    {   size_t const litCSize = ZSTD_decodeLiteralsBlock(dctx, src, srcSize);
-        DEBUGLOG(5, "ZSTD_decodeLiteralsBlock : %u", (U32)litCSize);
-        if (ZSTD_isError(litCSize)) return litCSize;
-        ip += litCSize;
-        srcSize -= litCSize;
-    }
-
-    /* Build Decoding Tables */
-    {   int nbSeq;
-        size_t const seqHSize = ZSTD_decodeSeqHeaders(dctx, &nbSeq, ip, srcSize);
-        if (ZSTD_isError(seqHSize)) return seqHSize;
-        ip += seqHSize;
-        srcSize -= seqHSize;
-
-        if ( (!frame || dctx->fParams.windowSize > (1<<24))
-          && (nbSeq>0) ) {  /* could probably use a larger nbSeq limit */
-            U32 const shareLongOffsets = ZSTD_getLongOffsetsShare(dctx->OFTptr);
-            U32 const minShare = MEM_64bits() ? 7 : 20; /* heuristic values, correspond to 2.73% and 7.81% */
-            if (shareLongOffsets >= minShare)
-                return ZSTD_decompressSequencesLong(dctx, dst, dstCapacity, ip, srcSize, nbSeq, isLongOffset);
-        }
-
-        return ZSTD_decompressSequences(dctx, dst, dstCapacity, ip, srcSize, nbSeq, isLongOffset);
-    }
-}
-
-
-static void ZSTD_checkContinuity(ZSTD_DCtx* dctx, const void* dst)
-{
-    if (dst != dctx->previousDstEnd) {   /* not contiguous */
-        dctx->dictEnd = dctx->previousDstEnd;
-        dctx->virtualStart = (const char*)dst - ((const char*)(dctx->previousDstEnd) - (const char*)(dctx->prefixStart));
-        dctx->prefixStart = dst;
-        dctx->previousDstEnd = dst;
-    }
-}
-
-size_t ZSTD_decompressBlock(ZSTD_DCtx* dctx,
-                            void* dst, size_t dstCapacity,
-                      const void* src, size_t srcSize)
-{
-    size_t dSize;
-    ZSTD_checkContinuity(dctx, dst);
-    dSize = ZSTD_decompressBlock_internal(dctx, dst, dstCapacity, src, srcSize, /* frame */ 0);
-    dctx->previousDstEnd = (char*)dst + dSize;
-    return dSize;
-}
-
-
-/** ZSTD_insertBlock() :
-    insert `src` block into `dctx` history. Useful to track uncompressed blocks. */
-ZSTDLIB_API size_t ZSTD_insertBlock(ZSTD_DCtx* dctx, const void* blockStart, size_t blockSize)
-{
-    ZSTD_checkContinuity(dctx, blockStart);
-    dctx->previousDstEnd = (const char*)blockStart + blockSize;
-    return blockSize;
-}
-
-
-static size_t ZSTD_generateNxBytes(void* dst, size_t dstCapacity, BYTE value, size_t length)
-{
-    if (length > dstCapacity) return ERROR(dstSize_tooSmall);
-    memset(dst, value, length);
-    return length;
-}
-
 /** ZSTD_findFrameCompressedSize() :
  *  compatible with legacy mode
  *  `src` must point to the start of a ZSTD frame, ZSTD legacy frame, or skippable frame
@@ -1806,9 +447,9 @@
     if (ZSTD_isLegacy(src, srcSize))
         return ZSTD_findFrameCompressedSizeLegacy(src, srcSize);
 #endif
-    if ( (srcSize >= ZSTD_skippableHeaderSize)
-      && (MEM_readLE32(src) & 0xFFFFFFF0U) == ZSTD_MAGIC_SKIPPABLE_START ) {
-        return ZSTD_skippableHeaderSize + MEM_readLE32((const BYTE*)src + ZSTD_FRAMEIDSIZE);
+    if ( (srcSize >= ZSTD_SKIPPABLEHEADERSIZE)
+      && (MEM_readLE32(src) & ZSTD_MAGIC_SKIPPABLE_MASK) == ZSTD_MAGIC_SKIPPABLE_START ) {
+        return readSkippableFrameSize(src, srcSize);
     } else {
         const BYTE* ip = (const BYTE*)src;
         const BYTE* const ipstart = ip;
@@ -1848,8 +489,64 @@
     }
 }
 
+
+
+/*-*************************************************************
+ *   Frame decoding
+ ***************************************************************/
+
+
+void ZSTD_checkContinuity(ZSTD_DCtx* dctx, const void* dst)
+{
+    if (dst != dctx->previousDstEnd) {   /* not contiguous */
+        dctx->dictEnd = dctx->previousDstEnd;
+        dctx->virtualStart = (const char*)dst - ((const char*)(dctx->previousDstEnd) - (const char*)(dctx->prefixStart));
+        dctx->prefixStart = dst;
+        dctx->previousDstEnd = dst;
+    }
+}
+
+/** ZSTD_insertBlock() :
+    insert `src` block into `dctx` history. Useful to track uncompressed blocks. */
+size_t ZSTD_insertBlock(ZSTD_DCtx* dctx, const void* blockStart, size_t blockSize)
+{
+    ZSTD_checkContinuity(dctx, blockStart);
+    dctx->previousDstEnd = (const char*)blockStart + blockSize;
+    return blockSize;
+}
+
+
+static size_t ZSTD_copyRawBlock(void* dst, size_t dstCapacity,
+                          const void* src, size_t srcSize)
+{
+    DEBUGLOG(5, "ZSTD_copyRawBlock");
+    if (dst == NULL) {
+        if (srcSize == 0) return 0;
+        return ERROR(dstBuffer_null);
+    }
+    if (srcSize > dstCapacity) return ERROR(dstSize_tooSmall);
+    memcpy(dst, src, srcSize);
+    return srcSize;
+}
+
+static size_t ZSTD_setRleBlock(void* dst, size_t dstCapacity,
+                               BYTE b,
+                               size_t regenSize)
+{
+    if (dst == NULL) {
+        if (regenSize == 0) return 0;
+        return ERROR(dstBuffer_null);
+    }
+    if (regenSize > dstCapacity) return ERROR(dstSize_tooSmall);
+    memset(dst, b, regenSize);
+    return regenSize;
+}
+
+
 /*! ZSTD_decompressFrame() :
-*   @dctx must be properly initialized */
+ * @dctx must be properly initialized
+ *  will update *srcPtr and *srcSizePtr,
+ *  to make *srcPtr progress by one frame. */
 static size_t ZSTD_decompressFrame(ZSTD_DCtx* dctx,
                                    void* dst, size_t dstCapacity,
                              const void** srcPtr, size_t *srcSizePtr)
@@ -1858,31 +555,33 @@
     BYTE* const ostart = (BYTE* const)dst;
     BYTE* const oend = ostart + dstCapacity;
     BYTE* op = ostart;
-    size_t remainingSize = *srcSizePtr;
+    size_t remainingSrcSize = *srcSizePtr;
+
+    DEBUGLOG(4, "ZSTD_decompressFrame (srcSize:%i)", (int)*srcSizePtr);
 
     /* check */
-    if (remainingSize < ZSTD_frameHeaderSize_min+ZSTD_blockHeaderSize)
+    if (remainingSrcSize < ZSTD_FRAMEHEADERSIZE_MIN+ZSTD_blockHeaderSize)
         return ERROR(srcSize_wrong);
 
     /* Frame Header */
-    {   size_t const frameHeaderSize = ZSTD_frameHeaderSize(ip, ZSTD_frameHeaderSize_prefix);
+    {   size_t const frameHeaderSize = ZSTD_frameHeaderSize(ip, ZSTD_FRAMEHEADERSIZE_PREFIX);
         if (ZSTD_isError(frameHeaderSize)) return frameHeaderSize;
-        if (remainingSize < frameHeaderSize+ZSTD_blockHeaderSize)
+        if (remainingSrcSize < frameHeaderSize+ZSTD_blockHeaderSize)
             return ERROR(srcSize_wrong);
         CHECK_F( ZSTD_decodeFrameHeader(dctx, ip, frameHeaderSize) );
-        ip += frameHeaderSize; remainingSize -= frameHeaderSize;
+        ip += frameHeaderSize; remainingSrcSize -= frameHeaderSize;
     }
 
     /* Loop on each block */
     while (1) {
         size_t decodedSize;
         blockProperties_t blockProperties;
-        size_t const cBlockSize = ZSTD_getcBlockSize(ip, remainingSize, &blockProperties);
+        size_t const cBlockSize = ZSTD_getcBlockSize(ip, remainingSrcSize, &blockProperties);
         if (ZSTD_isError(cBlockSize)) return cBlockSize;
 
         ip += ZSTD_blockHeaderSize;
-        remainingSize -= ZSTD_blockHeaderSize;
-        if (cBlockSize > remainingSize) return ERROR(srcSize_wrong);
+        remainingSrcSize -= ZSTD_blockHeaderSize;
+        if (cBlockSize > remainingSrcSize) return ERROR(srcSize_wrong);
 
         switch(blockProperties.blockType)
         {
@@ -1893,7 +592,7 @@
             decodedSize = ZSTD_copyRawBlock(op, oend-op, ip, cBlockSize);
             break;
         case bt_rle :
-            decodedSize = ZSTD_generateNxBytes(op, oend-op, *ip, blockProperties.origSize);
+            decodedSize = ZSTD_setRleBlock(op, oend-op, *ip, blockProperties.origSize);
             break;
         case bt_reserved :
         default:
@@ -1905,7 +604,7 @@
             XXH64_update(&dctx->xxhState, op, decodedSize);
         op += decodedSize;
         ip += cBlockSize;
-        remainingSize -= cBlockSize;
+        remainingSrcSize -= cBlockSize;
         if (blockProperties.lastBlock) break;
     }
 
@@ -1916,16 +615,16 @@
     if (dctx->fParams.checksumFlag) { /* Frame content checksum verification */
         U32 const checkCalc = (U32)XXH64_digest(&dctx->xxhState);
         U32 checkRead;
-        if (remainingSize<4) return ERROR(checksum_wrong);
+        if (remainingSrcSize<4) return ERROR(checksum_wrong);
         checkRead = MEM_readLE32(ip);
         if (checkRead != checkCalc) return ERROR(checksum_wrong);
         ip += 4;
-        remainingSize -= 4;
+        remainingSrcSize -= 4;
     }
 
     /* Allow caller to get size read */
     *srcPtr = ip;
-    *srcSizePtr = remainingSize;
+    *srcSizePtr = remainingSrcSize;
     return op-ostart;
 }
 
@@ -1942,11 +641,11 @@
     assert(dict==NULL || ddict==NULL);  /* either dict or ddict set, not both */
 
     if (ddict) {
-        dict = ZSTD_DDictDictContent(ddict);
-        dictSize = ZSTD_DDictDictSize(ddict);
+        dict = ZSTD_DDict_dictContent(ddict);
+        dictSize = ZSTD_DDict_dictSize(ddict);
     }
 
-    while (srcSize >= ZSTD_frameHeaderSize_prefix) {
+    while (srcSize >= ZSTD_FRAMEHEADERSIZE_PREFIX) {
 
 #if defined(ZSTD_LEGACY_SUPPORT) && (ZSTD_LEGACY_SUPPORT >= 1)
         if (ZSTD_isLegacy(src, srcSize)) {
@@ -1957,7 +656,9 @@
             if (dctx->staticSize) return ERROR(memory_allocation);
 
             decodedSize = ZSTD_decompressLegacy(dst, dstCapacity, src, frameSize, dict, dictSize);
+            if (ZSTD_isError(decodedSize)) return decodedSize;
 
+            assert(decodedSize <=- dstCapacity);
             dst = (BYTE*)dst + decodedSize;
             dstCapacity -= decodedSize;
 
@@ -1970,13 +671,11 @@
 
         {   U32 const magicNumber = MEM_readLE32(src);
             DEBUGLOG(4, "reading magic number %08X (expecting %08X)",
-                        (U32)magicNumber, (U32)ZSTD_MAGICNUMBER);
-            if ((magicNumber & 0xFFFFFFF0U) == ZSTD_MAGIC_SKIPPABLE_START) {
-                size_t skippableSize;
-                if (srcSize < ZSTD_skippableHeaderSize)
-                    return ERROR(srcSize_wrong);
-                skippableSize = MEM_readLE32((const BYTE*)src + ZSTD_FRAMEIDSIZE)
-                              + ZSTD_skippableHeaderSize;
+                        (unsigned)magicNumber, ZSTD_MAGICNUMBER);
+            if ((magicNumber & ZSTD_MAGIC_SKIPPABLE_MASK) == ZSTD_MAGIC_SKIPPABLE_START) {
+                size_t const skippableSize = readSkippableFrameSize(src, srcSize);
+                if (ZSTD_isError(skippableSize))
+                    return skippableSize;
                 if (srcSize < skippableSize) return ERROR(srcSize_wrong);
 
                 src = (const BYTE *)src + skippableSize;
@@ -2010,7 +709,7 @@
                 return ERROR(srcSize_wrong);
             }
             if (ZSTD_isError(res)) return res;
-            /* no need to bound check, ZSTD_decompressFrame already has */
+            assert(res <= dstCapacity);
             dst = (BYTE*)dst + res;
             dstCapacity -= res;
         }
@@ -2090,9 +789,10 @@
  *            or an error code, which can be tested using ZSTD_isError() */
 size_t ZSTD_decompressContinue(ZSTD_DCtx* dctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize)
 {
-    DEBUGLOG(5, "ZSTD_decompressContinue (srcSize:%u)", (U32)srcSize);
+    DEBUGLOG(5, "ZSTD_decompressContinue (srcSize:%u)", (unsigned)srcSize);
     /* Sanity check */
-    if (srcSize != dctx->expected) return ERROR(srcSize_wrong);  /* not allowed */
+    if (srcSize != dctx->expected)
+        return ERROR(srcSize_wrong);  /* not allowed */
     if (dstCapacity) ZSTD_checkContinuity(dctx, dst);
 
     switch (dctx->stage)
@@ -2101,9 +801,9 @@
         assert(src != NULL);
         if (dctx->format == ZSTD_f_zstd1) {  /* allows header */
             assert(srcSize >= ZSTD_FRAMEIDSIZE);  /* to read skippable magic number */
-            if ((MEM_readLE32(src) & 0xFFFFFFF0U) == ZSTD_MAGIC_SKIPPABLE_START) {        /* skippable frame */
+            if ((MEM_readLE32(src) & ZSTD_MAGIC_SKIPPABLE_MASK) == ZSTD_MAGIC_SKIPPABLE_START) {        /* skippable frame */
                 memcpy(dctx->headerBuffer, src, srcSize);
-                dctx->expected = ZSTD_skippableHeaderSize - srcSize;  /* remaining to load to get full skippable frame header */
+                dctx->expected = ZSTD_SKIPPABLEHEADERSIZE - srcSize;  /* remaining to load to get full skippable frame header */
                 dctx->stage = ZSTDds_decodeSkippableHeader;
                 return 0;
         }   }
@@ -2163,19 +863,19 @@
                 rSize = ZSTD_copyRawBlock(dst, dstCapacity, src, srcSize);
                 break;
             case bt_rle :
-                rSize = ZSTD_setRleBlock(dst, dstCapacity, src, srcSize, dctx->rleSize);
+                rSize = ZSTD_setRleBlock(dst, dstCapacity, *(const BYTE*)src, dctx->rleSize);
                 break;
             case bt_reserved :   /* should never happen */
             default:
                 return ERROR(corruption_detected);
             }
             if (ZSTD_isError(rSize)) return rSize;
-            DEBUGLOG(5, "ZSTD_decompressContinue: decoded size from block : %u", (U32)rSize);
+            DEBUGLOG(5, "ZSTD_decompressContinue: decoded size from block : %u", (unsigned)rSize);
             dctx->decodedSize += rSize;
             if (dctx->fParams.checksumFlag) XXH64_update(&dctx->xxhState, dst, rSize);
 
             if (dctx->stage == ZSTDds_decompressLastBlock) {   /* end of frame */
-                DEBUGLOG(4, "ZSTD_decompressContinue: decoded size from frame : %u", (U32)dctx->decodedSize);
+                DEBUGLOG(4, "ZSTD_decompressContinue: decoded size from frame : %u", (unsigned)dctx->decodedSize);
                 if (dctx->fParams.frameContentSize != ZSTD_CONTENTSIZE_UNKNOWN) {
                     if (dctx->decodedSize != dctx->fParams.frameContentSize) {
                         return ERROR(corruption_detected);
@@ -2199,7 +899,7 @@
         assert(srcSize == 4);  /* guaranteed by dctx->expected */
         {   U32 const h32 = (U32)XXH64_digest(&dctx->xxhState);
             U32 const check32 = MEM_readLE32(src);
-            DEBUGLOG(4, "ZSTD_decompressContinue: checksum : calculated %08X :: %08X read", h32, check32);
+            DEBUGLOG(4, "ZSTD_decompressContinue: checksum : calculated %08X :: %08X read", (unsigned)h32, (unsigned)check32);
             if (check32 != h32) return ERROR(checksum_wrong);
             dctx->expected = 0;
             dctx->stage = ZSTDds_getFrameHeaderSize;
@@ -2208,8 +908,8 @@
 
     case ZSTDds_decodeSkippableHeader:
         assert(src != NULL);
-        assert(srcSize <= ZSTD_skippableHeaderSize);
-        memcpy(dctx->headerBuffer + (ZSTD_skippableHeaderSize - srcSize), src, srcSize);   /* complete skippable header */
+        assert(srcSize <= ZSTD_SKIPPABLEHEADERSIZE);
+        memcpy(dctx->headerBuffer + (ZSTD_SKIPPABLEHEADERSIZE - srcSize), src, srcSize);   /* complete skippable header */
         dctx->expected = MEM_readLE32(dctx->headerBuffer + ZSTD_FRAMEIDSIZE);   /* note : dctx->expected can grow seriously large, beyond local buffer size */
         dctx->stage = ZSTDds_skipFrame;
         return 0;
@@ -2220,7 +920,8 @@
         return 0;
 
     default:
-        return ERROR(GENERIC);   /* impossible */
+        assert(0);   /* impossible */
+        return ERROR(GENERIC);   /* some compiler require default to do something */
     }
 }
 
@@ -2234,11 +935,12 @@
     return 0;
 }
 
-/*! ZSTD_loadEntropy() :
+/*! ZSTD_loadDEntropy() :
  *  dict : must point at beginning of a valid zstd dictionary.
  * @return : size of entropy tables read */
-static size_t ZSTD_loadEntropy(ZSTD_entropyDTables_t* entropy,
-                         const void* const dict, size_t const dictSize)
+size_t
+ZSTD_loadDEntropy(ZSTD_entropyDTables_t* entropy,
+                  const void* const dict, size_t const dictSize)
 {
     const BYTE* dictPtr = (const BYTE*)dict;
     const BYTE* const dictEnd = dictPtr + dictSize;
@@ -2252,15 +954,22 @@
     ZSTD_STATIC_ASSERT(sizeof(entropy->LLTable) + sizeof(entropy->OFTable) + sizeof(entropy->MLTable) >= HUF_DECOMPRESS_WORKSPACE_SIZE);
     {   void* const workspace = &entropy->LLTable;   /* use fse tables as temporary workspace; implies fse tables are grouped together */
         size_t const workspaceSize = sizeof(entropy->LLTable) + sizeof(entropy->OFTable) + sizeof(entropy->MLTable);
+#ifdef HUF_FORCE_DECOMPRESS_X1
+        /* in minimal huffman, we always use X1 variants */
+        size_t const hSize = HUF_readDTableX1_wksp(entropy->hufTable,
+                                                dictPtr, dictEnd - dictPtr,
+                                                workspace, workspaceSize);
+#else
         size_t const hSize = HUF_readDTableX2_wksp(entropy->hufTable,
                                                 dictPtr, dictEnd - dictPtr,
                                                 workspace, workspaceSize);
+#endif
         if (HUF_isError(hSize)) return ERROR(dictionary_corrupted);
         dictPtr += hSize;
     }
 
     {   short offcodeNCount[MaxOff+1];
-        U32 offcodeMaxValue = MaxOff, offcodeLog;
+        unsigned offcodeMaxValue = MaxOff, offcodeLog;
         size_t const offcodeHeaderSize = FSE_readNCount(offcodeNCount, &offcodeMaxValue, &offcodeLog, dictPtr, dictEnd-dictPtr);
         if (FSE_isError(offcodeHeaderSize)) return ERROR(dictionary_corrupted);
         if (offcodeMaxValue > MaxOff) return ERROR(dictionary_corrupted);
@@ -2320,7 +1029,7 @@
     dctx->dictID = MEM_readLE32((const char*)dict + ZSTD_FRAMEIDSIZE);
 
     /* load entropy tables */
-    {   size_t const eSize = ZSTD_loadEntropy(&dctx->entropy, dict, dictSize);
+    {   size_t const eSize = ZSTD_loadDEntropy(&dctx->entropy, dict, dictSize);
         if (ZSTD_isError(eSize)) return ERROR(dictionary_corrupted);
         dict = (const char*)dict + eSize;
         dictSize -= eSize;
@@ -2364,209 +1073,25 @@
 
 /* ======   ZSTD_DDict   ====== */
 
-struct ZSTD_DDict_s {
-    void* dictBuffer;
-    const void* dictContent;
-    size_t dictSize;
-    ZSTD_entropyDTables_t entropy;
-    U32 dictID;
-    U32 entropyPresent;
-    ZSTD_customMem cMem;
-};  /* typedef'd to ZSTD_DDict within "zstd.h" */
-
-static const void* ZSTD_DDictDictContent(const ZSTD_DDict* ddict)
-{
-    assert(ddict != NULL);
-    return ddict->dictContent;
-}
-
-static size_t ZSTD_DDictDictSize(const ZSTD_DDict* ddict)
-{
-    assert(ddict != NULL);
-    return ddict->dictSize;
-}
-
 size_t ZSTD_decompressBegin_usingDDict(ZSTD_DCtx* dctx, const ZSTD_DDict* ddict)
 {
     DEBUGLOG(4, "ZSTD_decompressBegin_usingDDict");
     assert(dctx != NULL);
     if (ddict) {
-        dctx->ddictIsCold = (dctx->dictEnd != (const char*)ddict->dictContent + ddict->dictSize);
+        const char* const dictStart = (const char*)ZSTD_DDict_dictContent(ddict);
+        size_t const dictSize = ZSTD_DDict_dictSize(ddict);
+        const void* const dictEnd = dictStart + dictSize;
+        dctx->ddictIsCold = (dctx->dictEnd != dictEnd);
         DEBUGLOG(4, "DDict is %s",
                     dctx->ddictIsCold ? "~cold~" : "hot!");
     }
     CHECK_F( ZSTD_decompressBegin(dctx) );
     if (ddict) {   /* NULL ddict is equivalent to no dictionary */
-        dctx->dictID = ddict->dictID;
-        dctx->prefixStart = ddict->dictContent;
-        dctx->virtualStart = ddict->dictContent;
-        dctx->dictEnd = (const BYTE*)ddict->dictContent + ddict->dictSize;
-        dctx->previousDstEnd = dctx->dictEnd;
-        if (ddict->entropyPresent) {
-            dctx->litEntropy = 1;
-            dctx->fseEntropy = 1;
-            dctx->LLTptr = ddict->entropy.LLTable;
-            dctx->MLTptr = ddict->entropy.MLTable;
-            dctx->OFTptr = ddict->entropy.OFTable;
-            dctx->HUFptr = ddict->entropy.hufTable;
-            dctx->entropy.rep[0] = ddict->entropy.rep[0];
-            dctx->entropy.rep[1] = ddict->entropy.rep[1];
-            dctx->entropy.rep[2] = ddict->entropy.rep[2];
-        } else {
-            dctx->litEntropy = 0;
-            dctx->fseEntropy = 0;
-        }
+        ZSTD_copyDDictParameters(dctx, ddict);
     }
     return 0;
 }
 
-static size_t
-ZSTD_loadEntropy_inDDict(ZSTD_DDict* ddict,
-                         ZSTD_dictContentType_e dictContentType)
-{
-    ddict->dictID = 0;
-    ddict->entropyPresent = 0;
-    if (dictContentType == ZSTD_dct_rawContent) return 0;
-
-    if (ddict->dictSize < 8) {
-        if (dictContentType == ZSTD_dct_fullDict)
-            return ERROR(dictionary_corrupted);   /* only accept specified dictionaries */
-        return 0;   /* pure content mode */
-    }
-    {   U32 const magic = MEM_readLE32(ddict->dictContent);
-        if (magic != ZSTD_MAGIC_DICTIONARY) {
-            if (dictContentType == ZSTD_dct_fullDict)
-                return ERROR(dictionary_corrupted);   /* only accept specified dictionaries */
-            return 0;   /* pure content mode */
-        }
-    }
-    ddict->dictID = MEM_readLE32((const char*)ddict->dictContent + ZSTD_FRAMEIDSIZE);
-
-    /* load entropy tables */
-    CHECK_E( ZSTD_loadEntropy(&ddict->entropy,
-                              ddict->dictContent, ddict->dictSize),
-             dictionary_corrupted );
-    ddict->entropyPresent = 1;
-    return 0;
-}
-
-
-static size_t ZSTD_initDDict_internal(ZSTD_DDict* ddict,
-                                      const void* dict, size_t dictSize,
-                                      ZSTD_dictLoadMethod_e dictLoadMethod,
-                                      ZSTD_dictContentType_e dictContentType)
-{
-    if ((dictLoadMethod == ZSTD_dlm_byRef) || (!dict) || (!dictSize)) {
-        ddict->dictBuffer = NULL;
-        ddict->dictContent = dict;
-        if (!dict) dictSize = 0;
-    } else {
-        void* const internalBuffer = ZSTD_malloc(dictSize, ddict->cMem);
-        ddict->dictBuffer = internalBuffer;
-        ddict->dictContent = internalBuffer;
-        if (!internalBuffer) return ERROR(memory_allocation);
-        memcpy(internalBuffer, dict, dictSize);
-    }
-    ddict->dictSize = dictSize;
-    ddict->entropy.hufTable[0] = (HUF_DTable)((HufLog)*0x1000001);  /* cover both little and big endian */
-
-    /* parse dictionary content */
-    CHECK_F( ZSTD_loadEntropy_inDDict(ddict, dictContentType) );
-
-    return 0;
-}
-
-ZSTD_DDict* ZSTD_createDDict_advanced(const void* dict, size_t dictSize,
-                                      ZSTD_dictLoadMethod_e dictLoadMethod,
-                                      ZSTD_dictContentType_e dictContentType,
-                                      ZSTD_customMem customMem)
-{
-    if (!customMem.customAlloc ^ !customMem.customFree) return NULL;
-
-    {   ZSTD_DDict* const ddict = (ZSTD_DDict*) ZSTD_malloc(sizeof(ZSTD_DDict), customMem);
-        if (ddict == NULL) return NULL;
-        ddict->cMem = customMem;
-        {   size_t const initResult = ZSTD_initDDict_internal(ddict,
-                                            dict, dictSize,
-                                            dictLoadMethod, dictContentType);
-            if (ZSTD_isError(initResult)) {
-                ZSTD_freeDDict(ddict);
-                return NULL;
-        }   }
-        return ddict;
-    }
-}
-
-/*! ZSTD_createDDict() :
-*   Create a digested dictionary, to start decompression without startup delay.
-*   `dict` content is copied inside DDict.
-*   Consequently, `dict` can be released after `ZSTD_DDict` creation */
-ZSTD_DDict* ZSTD_createDDict(const void* dict, size_t dictSize)
-{
-    ZSTD_customMem const allocator = { NULL, NULL, NULL };
-    return ZSTD_createDDict_advanced(dict, dictSize, ZSTD_dlm_byCopy, ZSTD_dct_auto, allocator);
-}
-
-/*! ZSTD_createDDict_byReference() :
- *  Create a digested dictionary, to start decompression without startup delay.
- *  Dictionary content is simply referenced, it will be accessed during decompression.
- *  Warning : dictBuffer must outlive DDict (DDict must be freed before dictBuffer) */
-ZSTD_DDict* ZSTD_createDDict_byReference(const void* dictBuffer, size_t dictSize)
-{
-    ZSTD_customMem const allocator = { NULL, NULL, NULL };
-    return ZSTD_createDDict_advanced(dictBuffer, dictSize, ZSTD_dlm_byRef, ZSTD_dct_auto, allocator);
-}
-
-
-const ZSTD_DDict* ZSTD_initStaticDDict(
-                                void* sBuffer, size_t sBufferSize,
-                                const void* dict, size_t dictSize,
-                                ZSTD_dictLoadMethod_e dictLoadMethod,
-                                ZSTD_dictContentType_e dictContentType)
-{
-    size_t const neededSpace = sizeof(ZSTD_DDict)
-                             + (dictLoadMethod == ZSTD_dlm_byRef ? 0 : dictSize);
-    ZSTD_DDict* const ddict = (ZSTD_DDict*)sBuffer;
-    assert(sBuffer != NULL);
-    assert(dict != NULL);
-    if ((size_t)sBuffer & 7) return NULL;   /* 8-aligned */
-    if (sBufferSize < neededSpace) return NULL;
-    if (dictLoadMethod == ZSTD_dlm_byCopy) {
-        memcpy(ddict+1, dict, dictSize);  /* local copy */
-        dict = ddict+1;
-    }
-    if (ZSTD_isError( ZSTD_initDDict_internal(ddict,
-                                              dict, dictSize,
-                                              ZSTD_dlm_byRef, dictContentType) ))
-        return NULL;
-    return ddict;
-}
-
-
-size_t ZSTD_freeDDict(ZSTD_DDict* ddict)
-{
-    if (ddict==NULL) return 0;   /* support free on NULL */
-    {   ZSTD_customMem const cMem = ddict->cMem;
-        ZSTD_free(ddict->dictBuffer, cMem);
-        ZSTD_free(ddict, cMem);
-        return 0;
-    }
-}
-
-/*! ZSTD_estimateDDictSize() :
- *  Estimate amount of memory that will be needed to create a dictionary for decompression.
- *  Note : dictionary created by reference using ZSTD_dlm_byRef are smaller */
-size_t ZSTD_estimateDDictSize(size_t dictSize, ZSTD_dictLoadMethod_e dictLoadMethod)
-{
-    return sizeof(ZSTD_DDict) + (dictLoadMethod == ZSTD_dlm_byRef ? 0 : dictSize);
-}
-
-size_t ZSTD_sizeof_DDict(const ZSTD_DDict* ddict)
-{
-    if (ddict==NULL) return 0;   /* support sizeof on NULL */
-    return sizeof(*ddict) + (ddict->dictBuffer ? ddict->dictSize : 0) ;
-}
-
 /*! ZSTD_getDictID_fromDict() :
  *  Provides the dictID stored within dictionary.
  *  if @return == 0, the dictionary is not conformant with Zstandard specification.
@@ -2578,16 +1103,6 @@
     return MEM_readLE32((const char*)dict + ZSTD_FRAMEIDSIZE);
 }
 
-/*! ZSTD_getDictID_fromDDict() :
- *  Provides the dictID of the dictionary loaded into `ddict`.
- *  If @return == 0, the dictionary is not conformant to Zstandard specification, or empty.
- *  Non-conformant dictionaries can still be loaded, but as content-only dictionaries. */
-unsigned ZSTD_getDictID_fromDDict(const ZSTD_DDict* ddict)
-{
-    if (ddict==NULL) return 0;
-    return ZSTD_getDictID_fromDict(ddict->dictContent, ddict->dictSize);
-}
-
 /*! ZSTD_getDictID_fromFrame() :
  *  Provides the dictID required to decompresse frame stored within `src`.
  *  If @return == 0, the dictID could not be decoded.
@@ -2695,7 +1210,7 @@
 
 
 /* ZSTD_initDStream_usingDict() :
- * return : expected size, aka ZSTD_frameHeaderSize_prefix.
+ * return : expected size, aka ZSTD_FRAMEHEADERSIZE_PREFIX.
  * this function cannot fail */
 size_t ZSTD_initDStream_usingDict(ZSTD_DStream* zds, const void* dict, size_t dictSize)
 {
@@ -2703,7 +1218,7 @@
     zds->streamStage = zdss_init;
     zds->noForwardProgress = 0;
     CHECK_F( ZSTD_DCtx_loadDictionary(zds, dict, dictSize) );
-    return ZSTD_frameHeaderSize_prefix;
+    return ZSTD_FRAMEHEADERSIZE_PREFIX;
 }
 
 /* note : this variant can't fail */
@@ -2724,7 +1239,7 @@
 }
 
 /* ZSTD_resetDStream() :
- * return : expected size, aka ZSTD_frameHeaderSize_prefix.
+ * return : expected size, aka ZSTD_FRAMEHEADERSIZE_PREFIX.
  * this function cannot fail */
 size_t ZSTD_resetDStream(ZSTD_DStream* dctx)
 {
@@ -2733,23 +1248,9 @@
     dctx->lhSize = dctx->inPos = dctx->outStart = dctx->outEnd = 0;
     dctx->legacyVersion = 0;
     dctx->hostageByte = 0;
-    return ZSTD_frameHeaderSize_prefix;
+    return ZSTD_FRAMEHEADERSIZE_PREFIX;
 }
 
-size_t ZSTD_setDStreamParameter(ZSTD_DStream* dctx,
-                                ZSTD_DStreamParameter_e paramType, unsigned paramValue)
-{
-    if (dctx->streamStage != zdss_init) return ERROR(stage_wrong);
-    switch(paramType)
-    {
-        default : return ERROR(parameter_unsupported);
-        case DStream_p_maxWindowSize :
-            DEBUGLOG(4, "setting maxWindowSize = %u KB", paramValue >> 10);
-            dctx->maxWindowSize = paramValue ? paramValue : (U32)(-1);
-            break;
-    }
-    return 0;
-}
 
 size_t ZSTD_DCtx_refDDict(ZSTD_DCtx* dctx, const ZSTD_DDict* ddict)
 {
@@ -2758,18 +1259,92 @@
     return 0;
 }
 
+/* ZSTD_DCtx_setMaxWindowSize() :
+ * note : no direct equivalence in ZSTD_DCtx_setParameter,
+ * since this version sets windowSize, and the other sets windowLog */
 size_t ZSTD_DCtx_setMaxWindowSize(ZSTD_DCtx* dctx, size_t maxWindowSize)
 {
+    ZSTD_bounds const bounds = ZSTD_dParam_getBounds(ZSTD_d_windowLogMax);
+    size_t const min = (size_t)1 << bounds.lowerBound;
+    size_t const max = (size_t)1 << bounds.upperBound;
     if (dctx->streamStage != zdss_init) return ERROR(stage_wrong);
+    if (maxWindowSize < min) return ERROR(parameter_outOfBound);
+    if (maxWindowSize > max) return ERROR(parameter_outOfBound);
     dctx->maxWindowSize = maxWindowSize;
     return 0;
 }
 
 size_t ZSTD_DCtx_setFormat(ZSTD_DCtx* dctx, ZSTD_format_e format)
 {
-    DEBUGLOG(4, "ZSTD_DCtx_setFormat : %u", (unsigned)format);
+    return ZSTD_DCtx_setParameter(dctx, ZSTD_d_format, format);
+}
+
+ZSTD_bounds ZSTD_dParam_getBounds(ZSTD_dParameter dParam)
+{
+    ZSTD_bounds bounds = { 0, 0, 0 };
+    switch(dParam) {
+        case ZSTD_d_windowLogMax:
+            bounds.lowerBound = ZSTD_WINDOWLOG_ABSOLUTEMIN;
+            bounds.upperBound = ZSTD_WINDOWLOG_MAX;
+            return bounds;
+        case ZSTD_d_format:
+            bounds.lowerBound = (int)ZSTD_f_zstd1;
+            bounds.upperBound = (int)ZSTD_f_zstd1_magicless;
+            ZSTD_STATIC_ASSERT(ZSTD_f_zstd1 < ZSTD_f_zstd1_magicless);
+            return bounds;
+        default:;
+    }
+    bounds.error = ERROR(parameter_unsupported);
+    return bounds;
+}
+
+/* ZSTD_dParam_withinBounds:
+ * @return 1 if value is within dParam bounds,
+ * 0 otherwise */
+static int ZSTD_dParam_withinBounds(ZSTD_dParameter dParam, int value)
+{
+    ZSTD_bounds const bounds = ZSTD_dParam_getBounds(dParam);
+    if (ZSTD_isError(bounds.error)) return 0;
+    if (value < bounds.lowerBound) return 0;
+    if (value > bounds.upperBound) return 0;
+    return 1;
+}
+
+#define CHECK_DBOUNDS(p,v) {                \
+    if (!ZSTD_dParam_withinBounds(p, v))    \
+        return ERROR(parameter_outOfBound); \
+}
+
+size_t ZSTD_DCtx_setParameter(ZSTD_DCtx* dctx, ZSTD_dParameter dParam, int value)
+{
     if (dctx->streamStage != zdss_init) return ERROR(stage_wrong);
-    dctx->format = format;
+    switch(dParam) {
+        case ZSTD_d_windowLogMax:
+            CHECK_DBOUNDS(ZSTD_d_windowLogMax, value);
+            dctx->maxWindowSize = ((size_t)1) << value;
+            return 0;
+        case ZSTD_d_format:
+            CHECK_DBOUNDS(ZSTD_d_format, value);
+            dctx->format = (ZSTD_format_e)value;
+            return 0;
+        default:;
+    }
+    return ERROR(parameter_unsupported);
+}
+
+size_t ZSTD_DCtx_reset(ZSTD_DCtx* dctx, ZSTD_ResetDirective reset)
+{
+    if ( (reset == ZSTD_reset_session_only)
+      || (reset == ZSTD_reset_session_and_parameters) ) {
+        (void)ZSTD_initDStream(dctx);
+    }
+    if ( (reset == ZSTD_reset_parameters)
+      || (reset == ZSTD_reset_session_and_parameters) ) {
+        if (dctx->streamStage != zdss_init)
+            return ERROR(stage_wrong);
+        dctx->format = ZSTD_f_zstd1;
+        dctx->maxWindowSize = ZSTD_MAXWINDOWSIZE_DEFAULT;
+    }
     return 0;
 }
 
@@ -2799,7 +1374,7 @@
 
 size_t ZSTD_estimateDStreamSize_fromFrame(const void* src, size_t srcSize)
 {
-    U32 const windowSizeMax = 1U << ZSTD_WINDOWLOG_MAX;   /* note : should be user-selectable */
+    U32 const windowSizeMax = 1U << ZSTD_WINDOWLOG_MAX;   /* note : should be user-selectable, but requires an additional parameter (or a dctx) */
     ZSTD_frameHeader zfh;
     size_t const err = ZSTD_getFrameHeader(&zfh, src, srcSize);
     if (ZSTD_isError(err)) return err;
@@ -2868,8 +1443,8 @@
 #if defined(ZSTD_LEGACY_SUPPORT) && (ZSTD_LEGACY_SUPPORT>=1)
                     U32 const legacyVersion = ZSTD_isLegacy(istart, iend-istart);
                     if (legacyVersion) {
-                        const void* const dict = zds->ddict ? zds->ddict->dictContent : NULL;
-                        size_t const dictSize = zds->ddict ? zds->ddict->dictSize : 0;
+                        const void* const dict = zds->ddict ? ZSTD_DDict_dictContent(zds->ddict) : NULL;
+                        size_t const dictSize = zds->ddict ? ZSTD_DDict_dictSize(zds->ddict) : 0;
                         DEBUGLOG(5, "ZSTD_decompressStream: detected legacy version v0.%u", legacyVersion);
                         /* legacy support is incompatible with static dctx */
                         if (zds->staticSize) return ERROR(memory_allocation);
@@ -2894,7 +1469,7 @@
                             zds->lhSize += remainingInput;
                         }
                         input->pos = input->size;
-                        return (MAX(ZSTD_frameHeaderSize_min, hSize) - zds->lhSize) + ZSTD_blockHeaderSize;   /* remaining header bytes + next block header */
+                        return (MAX(ZSTD_FRAMEHEADERSIZE_MIN, hSize) - zds->lhSize) + ZSTD_blockHeaderSize;   /* remaining header bytes + next block header */
                     }
                     assert(ip != NULL);
                     memcpy(zds->headerBuffer + zds->lhSize, ip, toLoad); zds->lhSize = hSize; ip += toLoad;
@@ -2922,7 +1497,7 @@
             DEBUGLOG(4, "Consume header");
             CHECK_F(ZSTD_decompressBegin_usingDDict(zds, zds->ddict));
 
-            if ((MEM_readLE32(zds->headerBuffer) & 0xFFFFFFF0U) == ZSTD_MAGIC_SKIPPABLE_START) {  /* skippable frame */
+            if ((MEM_readLE32(zds->headerBuffer) & ZSTD_MAGIC_SKIPPABLE_MASK) == ZSTD_MAGIC_SKIPPABLE_START) {  /* skippable frame */
                 zds->expected = MEM_readLE32(zds->headerBuffer + ZSTD_FRAMEIDSIZE);
                 zds->stage = ZSTDds_skipFrame;
             } else {
@@ -3038,7 +1613,9 @@
             someMoreWork = 0;
             break;
 
-        default: return ERROR(GENERIC);   /* impossible */
+        default:
+            assert(0);    /* impossible */
+            return ERROR(GENERIC);   /* some compiler require default to do something */
     }   }
 
     /* result */
@@ -3080,13 +1657,7 @@
     }
 }
 
-
-size_t ZSTD_decompress_generic(ZSTD_DCtx* dctx, ZSTD_outBuffer* output, ZSTD_inBuffer* input)
-{
-    return ZSTD_decompressStream(dctx, output, input);
-}
-
-size_t ZSTD_decompress_generic_simpleArgs (
+size_t ZSTD_decompressStream_simpleArgs (
                             ZSTD_DCtx* dctx,
                             void* dst, size_t dstCapacity, size_t* dstPos,
                       const void* src, size_t srcSize, size_t* srcPos)
@@ -3094,15 +1665,8 @@
     ZSTD_outBuffer output = { dst, dstCapacity, *dstPos };
     ZSTD_inBuffer  input  = { src, srcSize, *srcPos };
     /* ZSTD_compress_generic() will check validity of dstPos and srcPos */
-    size_t const cErr = ZSTD_decompress_generic(dctx, &output, &input);
+    size_t const cErr = ZSTD_decompressStream(dctx, &output, &input);
     *dstPos = output.pos;
     *srcPos = input.pos;
     return cErr;
 }
-
-void ZSTD_DCtx_reset(ZSTD_DCtx* dctx)
-{
-    (void)ZSTD_initDStream(dctx);
-    dctx->format = ZSTD_f_zstd1;
-    dctx->maxWindowSize = ZSTD_MAXWINDOWSIZE_DEFAULT;
-}
diff --git a/contrib/python-zstandard/zstd/decompress/zstd_decompress_block.h b/contrib/python-zstandard/zstd/decompress/zstd_decompress_block.h
new file mode 100644
--- /dev/null
+++ b/contrib/python-zstandard/zstd/decompress/zstd_decompress_block.h
@@ -0,0 +1,59 @@
+/*
+ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This source code is licensed under both the BSD-style license (found in the
+ * LICENSE file in the root directory of this source tree) and the GPLv2 (found
+ * in the COPYING file in the root directory of this source tree).
+ * You may select, at your option, one of the above-listed licenses.
+ */
+
+
+#ifndef ZSTD_DEC_BLOCK_H
+#define ZSTD_DEC_BLOCK_H
+
+/*-*******************************************************
+ *  Dependencies
+ *********************************************************/
+#include <stddef.h>   /* size_t */
+#include "zstd.h"    /* DCtx, and some public functions */
+#include "zstd_internal.h"  /* blockProperties_t, and some public functions */
+#include "zstd_decompress_internal.h"  /* ZSTD_seqSymbol */
+
+
+/* ===   Prototypes   === */
+
+/* note: prototypes already published within `zstd.h` :
+ * ZSTD_decompressBlock()
+ */
+
+/* note: prototypes already published within `zstd_internal.h` :
+ * ZSTD_getcBlockSize()
+ * ZSTD_decodeSeqHeaders()
+ */
+
+
+/* ZSTD_decompressBlock_internal() :
+ * decompress block, starting at `src`,
+ * into destination buffer `dst`.
+ * @return : decompressed block size,
+ *           or an error code (which can be tested using ZSTD_isError())
+ */
+size_t ZSTD_decompressBlock_internal(ZSTD_DCtx* dctx,
+                               void* dst, size_t dstCapacity,
+                         const void* src, size_t srcSize, const int frame);
+
+/* ZSTD_buildFSETable() :
+ * generate FSE decoding table for one symbol (ll, ml or off)
+ * this function must be called with valid parameters only
+ * (dt is large enough, normalizedCounter distribution total is a power of 2, max is within range, etc.)
+ * in which case it cannot fail.
+ * Internal use only.
+ */
+void ZSTD_buildFSETable(ZSTD_seqSymbol* dt,
+             const short* normalizedCounter, unsigned maxSymbolValue,
+             const U32* baseValue, const U32* nbAdditionalBits,
+                   unsigned tableLog);
+
+
+#endif /* ZSTD_DEC_BLOCK_H */
diff --git a/contrib/python-zstandard/zstd/decompress/zstd_decompress_block.c b/contrib/python-zstandard/zstd/decompress/zstd_decompress_block.c
new file mode 100644
--- /dev/null
+++ b/contrib/python-zstandard/zstd/decompress/zstd_decompress_block.c
@@ -0,0 +1,1307 @@
+/*
+ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This source code is licensed under both the BSD-style license (found in the
+ * LICENSE file in the root directory of this source tree) and the GPLv2 (found
+ * in the COPYING file in the root directory of this source tree).
+ * You may select, at your option, one of the above-listed licenses.
+ */
+
+/* zstd_decompress_block :
+ * this module takes care of decompressing _compressed_ block */
+
+/*-*******************************************************
+*  Dependencies
+*********************************************************/
+#include <string.h>      /* memcpy, memmove, memset */
+#include "compiler.h"    /* prefetch */
+#include "cpu.h"         /* bmi2 */
+#include "mem.h"         /* low level memory routines */
+#define FSE_STATIC_LINKING_ONLY
+#include "fse.h"
+#define HUF_STATIC_LINKING_ONLY
+#include "huf.h"
+#include "zstd_internal.h"
+#include "zstd_decompress_internal.h"   /* ZSTD_DCtx */
+#include "zstd_ddict.h"  /* ZSTD_DDictDictContent */
+#include "zstd_decompress_block.h"
+
+/*_*******************************************************
+*  Macros
+**********************************************************/
+
+/* These two optional macros force the use one way or another of the two
+ * ZSTD_decompressSequences implementations. You can't force in both directions
+ * at the same time.
+ */
+#if defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT) && \
+    defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG)
+#error "Cannot force the use of the short and the long ZSTD_decompressSequences variants!"
+#endif
+
+
+/*_*******************************************************
+*  Memory operations
+**********************************************************/
+static void ZSTD_copy4(void* dst, const void* src) { memcpy(dst, src, 4); }
+
+
+/*-*************************************************************
+ *   Block decoding
+ ***************************************************************/
+
+/*! ZSTD_getcBlockSize() :
+ *  Provides the size of compressed block from block header `src` */
+size_t ZSTD_getcBlockSize(const void* src, size_t srcSize,
+                          blockProperties_t* bpPtr)
+{
+    if (srcSize < ZSTD_blockHeaderSize) return ERROR(srcSize_wrong);
+    {   U32 const cBlockHeader = MEM_readLE24(src);
+        U32 const cSize = cBlockHeader >> 3;
+        bpPtr->lastBlock = cBlockHeader & 1;
+        bpPtr->blockType = (blockType_e)((cBlockHeader >> 1) & 3);
+        bpPtr->origSize = cSize;   /* only useful for RLE */
+        if (bpPtr->blockType == bt_rle) return 1;
+        if (bpPtr->blockType == bt_reserved) return ERROR(corruption_detected);
+        return cSize;
+    }
+}
+
+
+/* Hidden declaration for fullbench */
+size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
+                          const void* src, size_t srcSize);
+/*! ZSTD_decodeLiteralsBlock() :
+ * @return : nb of bytes read from src (< srcSize )
+ *  note : symbol not declared but exposed for fullbench */
+size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
+                          const void* src, size_t srcSize)   /* note : srcSize < BLOCKSIZE */
+{
+    if (srcSize < MIN_CBLOCK_SIZE) return ERROR(corruption_detected);
+
+    {   const BYTE* const istart = (const BYTE*) src;
+        symbolEncodingType_e const litEncType = (symbolEncodingType_e)(istart[0] & 3);
+
+        switch(litEncType)
+        {
+        case set_repeat:
+            if (dctx->litEntropy==0) return ERROR(dictionary_corrupted);
+            /* fall-through */
+
+        case set_compressed:
+            if (srcSize < 5) return ERROR(corruption_detected);   /* srcSize >= MIN_CBLOCK_SIZE == 3; here we need up to 5 for case 3 */
+            {   size_t lhSize, litSize, litCSize;
+                U32 singleStream=0;
+                U32 const lhlCode = (istart[0] >> 2) & 3;
+                U32 const lhc = MEM_readLE32(istart);
+                size_t hufSuccess;
+                switch(lhlCode)
+                {
+                case 0: case 1: default:   /* note : default is impossible, since lhlCode into [0..3] */
+                    /* 2 - 2 - 10 - 10 */
+                    singleStream = !lhlCode;
+                    lhSize = 3;
+                    litSize  = (lhc >> 4) & 0x3FF;
+                    litCSize = (lhc >> 14) & 0x3FF;
+                    break;
+                case 2:
+                    /* 2 - 2 - 14 - 14 */
+                    lhSize = 4;
+                    litSize  = (lhc >> 4) & 0x3FFF;
+                    litCSize = lhc >> 18;
+                    break;
+                case 3:
+                    /* 2 - 2 - 18 - 18 */
+                    lhSize = 5;
+                    litSize  = (lhc >> 4) & 0x3FFFF;
+                    litCSize = (lhc >> 22) + (istart[4] << 10);
+                    break;
+                }
+                if (litSize > ZSTD_BLOCKSIZE_MAX) return ERROR(corruption_detected);
+                if (litCSize + lhSize > srcSize) return ERROR(corruption_detected);
+
+                /* prefetch huffman table if cold */
+                if (dctx->ddictIsCold && (litSize > 768 /* heuristic */)) {
+                    PREFETCH_AREA(dctx->HUFptr, sizeof(dctx->entropy.hufTable));
+                }
+
+                if (litEncType==set_repeat) {
+                    if (singleStream) {
+                        hufSuccess = HUF_decompress1X_usingDTable_bmi2(
+                            dctx->litBuffer, litSize, istart+lhSize, litCSize,
+                            dctx->HUFptr, dctx->bmi2);
+                    } else {
+                        hufSuccess = HUF_decompress4X_usingDTable_bmi2(
+                            dctx->litBuffer, litSize, istart+lhSize, litCSize,
+                            dctx->HUFptr, dctx->bmi2);
+                    }
+                } else {
+                    if (singleStream) {
+#if defined(HUF_FORCE_DECOMPRESS_X2)
+                        hufSuccess = HUF_decompress1X_DCtx_wksp(
+                            dctx->entropy.hufTable, dctx->litBuffer, litSize,
+                            istart+lhSize, litCSize, dctx->workspace,
+                            sizeof(dctx->workspace));
+#else
+                        hufSuccess = HUF_decompress1X1_DCtx_wksp_bmi2(
+                            dctx->entropy.hufTable, dctx->litBuffer, litSize,
+                            istart+lhSize, litCSize, dctx->workspace,
+                            sizeof(dctx->workspace), dctx->bmi2);
+#endif
+                    } else {
+                        hufSuccess = HUF_decompress4X_hufOnly_wksp_bmi2(
+                            dctx->entropy.hufTable, dctx->litBuffer, litSize,
+                            istart+lhSize, litCSize, dctx->workspace,
+                            sizeof(dctx->workspace), dctx->bmi2);
+                    }
+                }
+
+                if (HUF_isError(hufSuccess)) return ERROR(corruption_detected);
+
+                dctx->litPtr = dctx->litBuffer;
+                dctx->litSize = litSize;
+                dctx->litEntropy = 1;
+                if (litEncType==set_compressed) dctx->HUFptr = dctx->entropy.hufTable;
+                memset(dctx->litBuffer + dctx->litSize, 0, WILDCOPY_OVERLENGTH);
+                return litCSize + lhSize;
+            }
+
+        case set_basic:
+            {   size_t litSize, lhSize;
+                U32 const lhlCode = ((istart[0]) >> 2) & 3;
+                switch(lhlCode)
+                {
+                case 0: case 2: default:   /* note : default is impossible, since lhlCode into [0..3] */
+                    lhSize = 1;
+                    litSize = istart[0] >> 3;
+                    break;
+                case 1:
+                    lhSize = 2;
+                    litSize = MEM_readLE16(istart) >> 4;
+                    break;
+                case 3:
+                    lhSize = 3;
+                    litSize = MEM_readLE24(istart) >> 4;
+                    break;
+                }
+
+                if (lhSize+litSize+WILDCOPY_OVERLENGTH > srcSize) {  /* risk reading beyond src buffer with wildcopy */
+                    if (litSize+lhSize > srcSize) return ERROR(corruption_detected);
+                    memcpy(dctx->litBuffer, istart+lhSize, litSize);
+                    dctx->litPtr = dctx->litBuffer;
+                    dctx->litSize = litSize;
+                    memset(dctx->litBuffer + dctx->litSize, 0, WILDCOPY_OVERLENGTH);
+                    return lhSize+litSize;
+                }
+                /* direct reference into compressed stream */
+                dctx->litPtr = istart+lhSize;
+                dctx->litSize = litSize;
+                return lhSize+litSize;
+            }
+
+        case set_rle:
+            {   U32 const lhlCode = ((istart[0]) >> 2) & 3;
+                size_t litSize, lhSize;
+                switch(lhlCode)
+                {
+                case 0: case 2: default:   /* note : default is impossible, since lhlCode into [0..3] */
+                    lhSize = 1;
+                    litSize = istart[0] >> 3;
+                    break;
+                case 1:
+                    lhSize = 2;
+                    litSize = MEM_readLE16(istart) >> 4;
+                    break;
+                case 3:
+                    lhSize = 3;
+                    litSize = MEM_readLE24(istart) >> 4;
+                    if (srcSize<4) return ERROR(corruption_detected);   /* srcSize >= MIN_CBLOCK_SIZE == 3; here we need lhSize+1 = 4 */
+                    break;
+                }
+                if (litSize > ZSTD_BLOCKSIZE_MAX) return ERROR(corruption_detected);
+                memset(dctx->litBuffer, istart[lhSize], litSize + WILDCOPY_OVERLENGTH);
+                dctx->litPtr = dctx->litBuffer;
+                dctx->litSize = litSize;
+                return lhSize+1;
+            }
+        default:
+            return ERROR(corruption_detected);   /* impossible */
+        }
+    }
+}
+
+/* Default FSE distribution tables.
+ * These are pre-calculated FSE decoding tables using default distributions as defined in specification :
+ * https://github.com/facebook/zstd/blob/master/doc/zstd_compression_format.md#default-distributions
+ * They were generated programmatically with following method :
+ * - start from default distributions, present in /lib/common/zstd_internal.h
+ * - generate tables normally, using ZSTD_buildFSETable()
+ * - printout the content of tables
+ * - pretify output, report below, test with fuzzer to ensure it's correct */
+
+/* Default FSE distribution table for Literal Lengths */
+static const ZSTD_seqSymbol LL_defaultDTable[(1<<LL_DEFAULTNORMLOG)+1] = {
+     {  1,  1,  1, LL_DEFAULTNORMLOG},  /* header : fastMode, tableLog */
+     /* nextState, nbAddBits, nbBits, baseVal */
+     {  0,  0,  4,    0},  { 16,  0,  4,    0},
+     { 32,  0,  5,    1},  {  0,  0,  5,    3},
+     {  0,  0,  5,    4},  {  0,  0,  5,    6},
+     {  0,  0,  5,    7},  {  0,  0,  5,    9},
+     {  0,  0,  5,   10},  {  0,  0,  5,   12},
+     {  0,  0,  6,   14},  {  0,  1,  5,   16},
+     {  0,  1,  5,   20},  {  0,  1,  5,   22},
+     {  0,  2,  5,   28},  {  0,  3,  5,   32},
+     {  0,  4,  5,   48},  { 32,  6,  5,   64},
+     {  0,  7,  5,  128},  {  0,  8,  6,  256},
+     {  0, 10,  6, 1024},  {  0, 12,  6, 4096},
+     { 32,  0,  4,    0},  {  0,  0,  4,    1},
+     {  0,  0,  5,    2},  { 32,  0,  5,    4},
+     {  0,  0,  5,    5},  { 32,  0,  5,    7},
+     {  0,  0,  5,    8},  { 32,  0,  5,   10},
+     {  0,  0,  5,   11},  {  0,  0,  6,   13},
+     { 32,  1,  5,   16},  {  0,  1,  5,   18},
+     { 32,  1,  5,   22},  {  0,  2,  5,   24},
+     { 32,  3,  5,   32},  {  0,  3,  5,   40},
+     {  0,  6,  4,   64},  { 16,  6,  4,   64},
+     { 32,  7,  5,  128},  {  0,  9,  6,  512},
+     {  0, 11,  6, 2048},  { 48,  0,  4,    0},
+     { 16,  0,  4,    1},  { 32,  0,  5,    2},
+     { 32,  0,  5,    3},  { 32,  0,  5,    5},
+     { 32,  0,  5,    6},  { 32,  0,  5,    8},
+     { 32,  0,  5,    9},  { 32,  0,  5,   11},
+     { 32,  0,  5,   12},  {  0,  0,  6,   15},
+     { 32,  1,  5,   18},  { 32,  1,  5,   20},
+     { 32,  2,  5,   24},  { 32,  2,  5,   28},
+     { 32,  3,  5,   40},  { 32,  4,  5,   48},
+     {  0, 16,  6,65536},  {  0, 15,  6,32768},
+     {  0, 14,  6,16384},  {  0, 13,  6, 8192},
+};   /* LL_defaultDTable */
+
+/* Default FSE distribution table for Offset Codes */
+static const ZSTD_seqSymbol OF_defaultDTable[(1<<OF_DEFAULTNORMLOG)+1] = {
+    {  1,  1,  1, OF_DEFAULTNORMLOG},  /* header : fastMode, tableLog */
+    /* nextState, nbAddBits, nbBits, baseVal */
+    {  0,  0,  5,    0},     {  0,  6,  4,   61},
+    {  0,  9,  5,  509},     {  0, 15,  5,32765},
+    {  0, 21,  5,2097149},   {  0,  3,  5,    5},
+    {  0,  7,  4,  125},     {  0, 12,  5, 4093},
+    {  0, 18,  5,262141},    {  0, 23,  5,8388605},
+    {  0,  5,  5,   29},     {  0,  8,  4,  253},
+    {  0, 14,  5,16381},     {  0, 20,  5,1048573},
+    {  0,  2,  5,    1},     { 16,  7,  4,  125},
+    {  0, 11,  5, 2045},     {  0, 17,  5,131069},
+    {  0, 22,  5,4194301},   {  0,  4,  5,   13},
+    { 16,  8,  4,  253},     {  0, 13,  5, 8189},
+    {  0, 19,  5,524285},    {  0,  1,  5,    1},
+    { 16,  6,  4,   61},     {  0, 10,  5, 1021},
+    {  0, 16,  5,65533},     {  0, 28,  5,268435453},
+    {  0, 27,  5,134217725}, {  0, 26,  5,67108861},
+    {  0, 25,  5,33554429},  {  0, 24,  5,16777213},
+};   /* OF_defaultDTable */
+
+
+/* Default FSE distribution table for Match Lengths */
+static const ZSTD_seqSymbol ML_defaultDTable[(1<<ML_DEFAULTNORMLOG)+1] = {
+    {  1,  1,  1, ML_DEFAULTNORMLOG},  /* header : fastMode, tableLog */
+    /* nextState, nbAddBits, nbBits, baseVal */
+    {  0,  0,  6,    3},  {  0,  0,  4,    4},
+    { 32,  0,  5,    5},  {  0,  0,  5,    6},
+    {  0,  0,  5,    8},  {  0,  0,  5,    9},
+    {  0,  0,  5,   11},  {  0,  0,  6,   13},
+    {  0,  0,  6,   16},  {  0,  0,  6,   19},
+    {  0,  0,  6,   22},  {  0,  0,  6,   25},
+    {  0,  0,  6,   28},  {  0,  0,  6,   31},
+    {  0,  0,  6,   34},  {  0,  1,  6,   37},
+    {  0,  1,  6,   41},  {  0,  2,  6,   47},
+    {  0,  3,  6,   59},  {  0,  4,  6,   83},
+    {  0,  7,  6,  131},  {  0,  9,  6,  515},
+    { 16,  0,  4,    4},  {  0,  0,  4,    5},
+    { 32,  0,  5,    6},  {  0,  0,  5,    7},
+    { 32,  0,  5,    9},  {  0,  0,  5,   10},
+    {  0,  0,  6,   12},  {  0,  0,  6,   15},
+    {  0,  0,  6,   18},  {  0,  0,  6,   21},
+    {  0,  0,  6,   24},  {  0,  0,  6,   27},
+    {  0,  0,  6,   30},  {  0,  0,  6,   33},
+    {  0,  1,  6,   35},  {  0,  1,  6,   39},
+    {  0,  2,  6,   43},  {  0,  3,  6,   51},
+    {  0,  4,  6,   67},  {  0,  5,  6,   99},
+    {  0,  8,  6,  259},  { 32,  0,  4,    4},
+    { 48,  0,  4,    4},  { 16,  0,  4,    5},
+    { 32,  0,  5,    7},  { 32,  0,  5,    8},
+    { 32,  0,  5,   10},  { 32,  0,  5,   11},
+    {  0,  0,  6,   14},  {  0,  0,  6,   17},
+    {  0,  0,  6,   20},  {  0,  0,  6,   23},
+    {  0,  0,  6,   26},  {  0,  0,  6,   29},
+    {  0,  0,  6,   32},  {  0, 16,  6,65539},
+    {  0, 15,  6,32771},  {  0, 14,  6,16387},
+    {  0, 13,  6, 8195},  {  0, 12,  6, 4099},
+    {  0, 11,  6, 2051},  {  0, 10,  6, 1027},
+};   /* ML_defaultDTable */
+
+
+static void ZSTD_buildSeqTable_rle(ZSTD_seqSymbol* dt, U32 baseValue, U32 nbAddBits)
+{
+    void* ptr = dt;
+    ZSTD_seqSymbol_header* const DTableH = (ZSTD_seqSymbol_header*)ptr;
+    ZSTD_seqSymbol* const cell = dt + 1;
+
+    DTableH->tableLog = 0;
+    DTableH->fastMode = 0;
+
+    cell->nbBits = 0;
+    cell->nextState = 0;
+    assert(nbAddBits < 255);
+    cell->nbAdditionalBits = (BYTE)nbAddBits;
+    cell->baseValue = baseValue;
+}
+
+
+/* ZSTD_buildFSETable() :
+ * generate FSE decoding table for one symbol (ll, ml or off)
+ * cannot fail if input is valid =>
+ * all inputs are presumed validated at this stage */
+void
+ZSTD_buildFSETable(ZSTD_seqSymbol* dt,
+            const short* normalizedCounter, unsigned maxSymbolValue,
+            const U32* baseValue, const U32* nbAdditionalBits,
+            unsigned tableLog)
+{
+    ZSTD_seqSymbol* const tableDecode = dt+1;
+    U16 symbolNext[MaxSeq+1];
+
+    U32 const maxSV1 = maxSymbolValue + 1;
+    U32 const tableSize = 1 << tableLog;
+    U32 highThreshold = tableSize-1;
+
+    /* Sanity Checks */
+    assert(maxSymbolValue <= MaxSeq);
+    assert(tableLog <= MaxFSELog);
+
+    /* Init, lay down lowprob symbols */
+    {   ZSTD_seqSymbol_header DTableH;
+        DTableH.tableLog = tableLog;
+        DTableH.fastMode = 1;
+        {   S16 const largeLimit= (S16)(1 << (tableLog-1));
+            U32 s;
+            for (s=0; s<maxSV1; s++) {
+                if (normalizedCounter[s]==-1) {
+                    tableDecode[highThreshold--].baseValue = s;
+                    symbolNext[s] = 1;
+                } else {
+                    if (normalizedCounter[s] >= largeLimit) DTableH.fastMode=0;
+                    symbolNext[s] = normalizedCounter[s];
+        }   }   }
+        memcpy(dt, &DTableH, sizeof(DTableH));
+    }
+
+    /* Spread symbols */
+    {   U32 const tableMask = tableSize-1;
+        U32 const step = FSE_TABLESTEP(tableSize);
+        U32 s, position = 0;
+        for (s=0; s<maxSV1; s++) {
+            int i;
+            for (i=0; i<normalizedCounter[s]; i++) {
+                tableDecode[position].baseValue = s;
+                position = (position + step) & tableMask;
+                while (position > highThreshold) position = (position + step) & tableMask;   /* lowprob area */
+        }   }
+        assert(position == 0); /* position must reach all cells once, otherwise normalizedCounter is incorrect */
+    }
+
+    /* Build Decoding table */
+    {   U32 u;
+        for (u=0; u<tableSize; u++) {
+            U32 const symbol = tableDecode[u].baseValue;
+            U32 const nextState = symbolNext[symbol]++;
+            tableDecode[u].nbBits = (BYTE) (tableLog - BIT_highbit32(nextState) );
+            tableDecode[u].nextState = (U16) ( (nextState << tableDecode[u].nbBits) - tableSize);
+            assert(nbAdditionalBits[symbol] < 255);
+            tableDecode[u].nbAdditionalBits = (BYTE)nbAdditionalBits[symbol];
+            tableDecode[u].baseValue = baseValue[symbol];
+    }   }
+}
+
+
+/*! ZSTD_buildSeqTable() :
+ * @return : nb bytes read from src,
+ *           or an error code if it fails */
+static size_t ZSTD_buildSeqTable(ZSTD_seqSymbol* DTableSpace, const ZSTD_seqSymbol** DTablePtr,
+                                 symbolEncodingType_e type, unsigned max, U32 maxLog,
+                                 const void* src, size_t srcSize,
+                                 const U32* baseValue, const U32* nbAdditionalBits,
+                                 const ZSTD_seqSymbol* defaultTable, U32 flagRepeatTable,
+                                 int ddictIsCold, int nbSeq)
+{
+    switch(type)
+    {
+    case set_rle :
+        if (!srcSize) return ERROR(srcSize_wrong);
+        if ( (*(const BYTE*)src) > max) return ERROR(corruption_detected);
+        {   U32 const symbol = *(const BYTE*)src;
+            U32 const baseline = baseValue[symbol];
+            U32 const nbBits = nbAdditionalBits[symbol];
+            ZSTD_buildSeqTable_rle(DTableSpace, baseline, nbBits);
+        }
+        *DTablePtr = DTableSpace;
+        return 1;
+    case set_basic :
+        *DTablePtr = defaultTable;
+        return 0;
+    case set_repeat:
+        if (!flagRepeatTable) return ERROR(corruption_detected);
+        /* prefetch FSE table if used */
+        if (ddictIsCold && (nbSeq > 24 /* heuristic */)) {
+            const void* const pStart = *DTablePtr;
+            size_t const pSize = sizeof(ZSTD_seqSymbol) * (SEQSYMBOL_TABLE_SIZE(maxLog));
+            PREFETCH_AREA(pStart, pSize);
+        }
+        return 0;
+    case set_compressed :
+        {   unsigned tableLog;
+            S16 norm[MaxSeq+1];
+            size_t const headerSize = FSE_readNCount(norm, &max, &tableLog, src, srcSize);
+            if (FSE_isError(headerSize)) return ERROR(corruption_detected);
+            if (tableLog > maxLog) return ERROR(corruption_detected);
+            ZSTD_buildFSETable(DTableSpace, norm, max, baseValue, nbAdditionalBits, tableLog);
+            *DTablePtr = DTableSpace;
+            return headerSize;
+        }
+    default :   /* impossible */
+        assert(0);
+        return ERROR(GENERIC);
+    }
+}
+
+size_t ZSTD_decodeSeqHeaders(ZSTD_DCtx* dctx, int* nbSeqPtr,
+                             const void* src, size_t srcSize)
+{
+    const BYTE* const istart = (const BYTE* const)src;
+    const BYTE* const iend = istart + srcSize;
+    const BYTE* ip = istart;
+    int nbSeq;
+    DEBUGLOG(5, "ZSTD_decodeSeqHeaders");
+
+    /* check */
+    if (srcSize < MIN_SEQUENCES_SIZE) return ERROR(srcSize_wrong);
+
+    /* SeqHead */
+    nbSeq = *ip++;
+    if (!nbSeq) {
+        *nbSeqPtr=0;
+        if (srcSize != 1) return ERROR(srcSize_wrong);
+        return 1;
+    }
+    if (nbSeq > 0x7F) {
+        if (nbSeq == 0xFF) {
+            if (ip+2 > iend) return ERROR(srcSize_wrong);
+            nbSeq = MEM_readLE16(ip) + LONGNBSEQ, ip+=2;
+        } else {
+            if (ip >= iend) return ERROR(srcSize_wrong);
+            nbSeq = ((nbSeq-0x80)<<8) + *ip++;
+        }
+    }
+    *nbSeqPtr = nbSeq;
+
+    /* FSE table descriptors */
+    if (ip+4 > iend) return ERROR(srcSize_wrong); /* minimum possible size */
+    {   symbolEncodingType_e const LLtype = (symbolEncodingType_e)(*ip >> 6);
+        symbolEncodingType_e const OFtype = (symbolEncodingType_e)((*ip >> 4) & 3);
+        symbolEncodingType_e const MLtype = (symbolEncodingType_e)((*ip >> 2) & 3);
+        ip++;
+
+        /* Build DTables */
+        {   size_t const llhSize = ZSTD_buildSeqTable(dctx->entropy.LLTable, &dctx->LLTptr,
+                                                      LLtype, MaxLL, LLFSELog,
+                                                      ip, iend-ip,
+                                                      LL_base, LL_bits,
+                                                      LL_defaultDTable, dctx->fseEntropy,
+                                                      dctx->ddictIsCold, nbSeq);
+            if (ZSTD_isError(llhSize)) return ERROR(corruption_detected);
+            ip += llhSize;
+        }
+
+        {   size_t const ofhSize = ZSTD_buildSeqTable(dctx->entropy.OFTable, &dctx->OFTptr,
+                                                      OFtype, MaxOff, OffFSELog,
+                                                      ip, iend-ip,
+                                                      OF_base, OF_bits,
+                                                      OF_defaultDTable, dctx->fseEntropy,
+                                                      dctx->ddictIsCold, nbSeq);
+            if (ZSTD_isError(ofhSize)) return ERROR(corruption_detected);
+            ip += ofhSize;
+        }
+
+        {   size_t const mlhSize = ZSTD_buildSeqTable(dctx->entropy.MLTable, &dctx->MLTptr,
+                                                      MLtype, MaxML, MLFSELog,
+                                                      ip, iend-ip,
+                                                      ML_base, ML_bits,
+                                                      ML_defaultDTable, dctx->fseEntropy,
+                                                      dctx->ddictIsCold, nbSeq);
+            if (ZSTD_isError(mlhSize)) return ERROR(corruption_detected);
+            ip += mlhSize;
+        }
+    }
+
+    return ip-istart;
+}
+
+
+typedef struct {
+    size_t litLength;
+    size_t matchLength;
+    size_t offset;
+    const BYTE* match;
+} seq_t;
+
+typedef struct {
+    size_t state;
+    const ZSTD_seqSymbol* table;
+} ZSTD_fseState;
+
+typedef struct {
+    BIT_DStream_t DStream;
+    ZSTD_fseState stateLL;
+    ZSTD_fseState stateOffb;
+    ZSTD_fseState stateML;
+    size_t prevOffset[ZSTD_REP_NUM];
+    const BYTE* prefixStart;
+    const BYTE* dictEnd;
+    size_t pos;
+} seqState_t;
+
+
+/* ZSTD_execSequenceLast7():
+ * exceptional case : decompress a match starting within last 7 bytes of output buffer.
+ * requires more careful checks, to ensure there is no overflow.
+ * performance does not matter though.
+ * note : this case is supposed to be never generated "naturally" by reference encoder,
+ *        since in most cases it needs at least 8 bytes to look for a match.
+ *        but it's allowed by the specification. */
+FORCE_NOINLINE
+size_t ZSTD_execSequenceLast7(BYTE* op,
+                              BYTE* const oend, seq_t sequence,
+                              const BYTE** litPtr, const BYTE* const litLimit,
+                              const BYTE* const base, const BYTE* const vBase, const BYTE* const dictEnd)
+{
+    BYTE* const oLitEnd = op + sequence.litLength;
+    size_t const sequenceLength = sequence.litLength + sequence.matchLength;
+    BYTE* const oMatchEnd = op + sequenceLength;   /* risk : address space overflow (32-bits) */
+    const BYTE* const iLitEnd = *litPtr + sequence.litLength;
+    const BYTE* match = oLitEnd - sequence.offset;
+
+    /* check */
+    if (oMatchEnd>oend) return ERROR(dstSize_tooSmall);   /* last match must fit within dstBuffer */
+    if (iLitEnd > litLimit) return ERROR(corruption_detected);   /* try to read beyond literal buffer */
+
+    /* copy literals */
+    while (op < oLitEnd) *op++ = *(*litPtr)++;
+
+    /* copy Match */
+    if (sequence.offset > (size_t)(oLitEnd - base)) {
+        /* offset beyond prefix */
+        if (sequence.offset > (size_t)(oLitEnd - vBase)) return ERROR(corruption_detected);
+        match = dictEnd - (base-match);
+        if (match + sequence.matchLength <= dictEnd) {
+            memmove(oLitEnd, match, sequence.matchLength);
+            return sequenceLength;
+        }
+        /* span extDict & currentPrefixSegment */
+        {   size_t const length1 = dictEnd - match;
+            memmove(oLitEnd, match, length1);
+            op = oLitEnd + length1;
+            sequence.matchLength -= length1;
+            match = base;
+    }   }
+    while (op < oMatchEnd) *op++ = *match++;
+    return sequenceLength;
+}
+
+
+HINT_INLINE
+size_t ZSTD_execSequence(BYTE* op,
+                         BYTE* const oend, seq_t sequence,
+                         const BYTE** litPtr, const BYTE* const litLimit,
+                         const BYTE* const prefixStart, const BYTE* const virtualStart, const BYTE* const dictEnd)
+{
+    BYTE* const oLitEnd = op + sequence.litLength;
+    size_t const sequenceLength = sequence.litLength + sequence.matchLength;
+    BYTE* const oMatchEnd = op + sequenceLength;   /* risk : address space overflow (32-bits) */
+    BYTE* const oend_w = oend - WILDCOPY_OVERLENGTH;
+    const BYTE* const iLitEnd = *litPtr + sequence.litLength;
+    const BYTE* match = oLitEnd - sequence.offset;
+
+    /* check */
+    if (oMatchEnd>oend) return ERROR(dstSize_tooSmall); /* last match must start at a minimum distance of WILDCOPY_OVERLENGTH from oend */
+    if (iLitEnd > litLimit) return ERROR(corruption_detected);   /* over-read beyond lit buffer */
+    if (oLitEnd>oend_w) return ZSTD_execSequenceLast7(op, oend, sequence, litPtr, litLimit, prefixStart, virtualStart, dictEnd);
+
+    /* copy Literals */
+    ZSTD_copy8(op, *litPtr);
+    if (sequence.litLength > 8)
+        ZSTD_wildcopy(op+8, (*litPtr)+8, sequence.litLength - 8);   /* note : since oLitEnd <= oend-WILDCOPY_OVERLENGTH, no risk of overwrite beyond oend */
+    op = oLitEnd;
+    *litPtr = iLitEnd;   /* update for next sequence */
+
+    /* copy Match */
+    if (sequence.offset > (size_t)(oLitEnd - prefixStart)) {
+        /* offset beyond prefix -> go into extDict */
+        if (sequence.offset > (size_t)(oLitEnd - virtualStart))
+            return ERROR(corruption_detected);
+        match = dictEnd + (match - prefixStart);
+        if (match + sequence.matchLength <= dictEnd) {
+            memmove(oLitEnd, match, sequence.matchLength);
+            return sequenceLength;
+        }
+        /* span extDict & currentPrefixSegment */
+        {   size_t const length1 = dictEnd - match;
+            memmove(oLitEnd, match, length1);
+            op = oLitEnd + length1;
+            sequence.matchLength -= length1;
+            match = prefixStart;
+            if (op > oend_w || sequence.matchLength < MINMATCH) {
+              U32 i;
+              for (i = 0; i < sequence.matchLength; ++i) op[i] = match[i];
+              return sequenceLength;
+            }
+    }   }
+    /* Requirement: op <= oend_w && sequence.matchLength >= MINMATCH */
+
+    /* match within prefix */
+    if (sequence.offset < 8) {
+        /* close range match, overlap */
+        static const U32 dec32table[] = { 0, 1, 2, 1, 4, 4, 4, 4 };   /* added */
+        static const int dec64table[] = { 8, 8, 8, 7, 8, 9,10,11 };   /* subtracted */
+        int const sub2 = dec64table[sequence.offset];
+        op[0] = match[0];
+        op[1] = match[1];
+        op[2] = match[2];
+        op[3] = match[3];
+        match += dec32table[sequence.offset];
+        ZSTD_copy4(op+4, match);
+        match -= sub2;
+    } else {
+        ZSTD_copy8(op, match);
+    }
+    op += 8; match += 8;
+
+    if (oMatchEnd > oend-(16-MINMATCH)) {
+        if (op < oend_w) {
+            ZSTD_wildcopy(op, match, oend_w - op);
+            match += oend_w - op;
+            op = oend_w;
+        }
+        while (op < oMatchEnd) *op++ = *match++;
+    } else {
+        ZSTD_wildcopy(op, match, (ptrdiff_t)sequence.matchLength-8);   /* works even if matchLength < 8 */
+    }
+    return sequenceLength;
+}
+
+
+HINT_INLINE
+size_t ZSTD_execSequenceLong(BYTE* op,
+                             BYTE* const oend, seq_t sequence,
+                             const BYTE** litPtr, const BYTE* const litLimit,
+                             const BYTE* const prefixStart, const BYTE* const dictStart, const BYTE* const dictEnd)
+{
+    BYTE* const oLitEnd = op + sequence.litLength;
+    size_t const sequenceLength = sequence.litLength + sequence.matchLength;
+    BYTE* const oMatchEnd = op + sequenceLength;   /* risk : address space overflow (32-bits) */
+    BYTE* const oend_w = oend - WILDCOPY_OVERLENGTH;
+    const BYTE* const iLitEnd = *litPtr + sequence.litLength;
+    const BYTE* match = sequence.match;
+
+    /* check */
+    if (oMatchEnd > oend) return ERROR(dstSize_tooSmall); /* last match must start at a minimum distance of WILDCOPY_OVERLENGTH from oend */
+    if (iLitEnd > litLimit) return ERROR(corruption_detected);   /* over-read beyond lit buffer */
+    if (oLitEnd > oend_w) return ZSTD_execSequenceLast7(op, oend, sequence, litPtr, litLimit, prefixStart, dictStart, dictEnd);
+
+    /* copy Literals */
+    ZSTD_copy8(op, *litPtr);  /* note : op <= oLitEnd <= oend_w == oend - 8 */
+    if (sequence.litLength > 8)
+        ZSTD_wildcopy(op+8, (*litPtr)+8, sequence.litLength - 8);   /* note : since oLitEnd <= oend-WILDCOPY_OVERLENGTH, no risk of overwrite beyond oend */
+    op = oLitEnd;
+    *litPtr = iLitEnd;   /* update for next sequence */
+
+    /* copy Match */
+    if (sequence.offset > (size_t)(oLitEnd - prefixStart)) {
+        /* offset beyond prefix */
+        if (sequence.offset > (size_t)(oLitEnd - dictStart)) return ERROR(corruption_detected);
+        if (match + sequence.matchLength <= dictEnd) {
+            memmove(oLitEnd, match, sequence.matchLength);
+            return sequenceLength;
+        }
+        /* span extDict & currentPrefixSegment */
+        {   size_t const length1 = dictEnd - match;
+            memmove(oLitEnd, match, length1);
+            op = oLitEnd + length1;
+            sequence.matchLength -= length1;
+            match = prefixStart;
+            if (op > oend_w || sequence.matchLength < MINMATCH) {
+              U32 i;
+              for (i = 0; i < sequence.matchLength; ++i) op[i] = match[i];
+              return sequenceLength;
+            }
+    }   }
+    assert(op <= oend_w);
+    assert(sequence.matchLength >= MINMATCH);
+
+    /* match within prefix */
+    if (sequence.offset < 8) {
+        /* close range match, overlap */
+        static const U32 dec32table[] = { 0, 1, 2, 1, 4, 4, 4, 4 };   /* added */
+        static const int dec64table[] = { 8, 8, 8, 7, 8, 9,10,11 };   /* subtracted */
+        int const sub2 = dec64table[sequence.offset];
+        op[0] = match[0];
+        op[1] = match[1];
+        op[2] = match[2];
+        op[3] = match[3];
+        match += dec32table[sequence.offset];
+        ZSTD_copy4(op+4, match);
+        match -= sub2;
+    } else {
+        ZSTD_copy8(op, match);
+    }
+    op += 8; match += 8;
+
+    if (oMatchEnd > oend-(16-MINMATCH)) {
+        if (op < oend_w) {
+            ZSTD_wildcopy(op, match, oend_w - op);
+            match += oend_w - op;
+            op = oend_w;
+        }
+        while (op < oMatchEnd) *op++ = *match++;
+    } else {
+        ZSTD_wildcopy(op, match, (ptrdiff_t)sequence.matchLength-8);   /* works even if matchLength < 8 */
+    }
+    return sequenceLength;
+}
+
+static void
+ZSTD_initFseState(ZSTD_fseState* DStatePtr, BIT_DStream_t* bitD, const ZSTD_seqSymbol* dt)
+{
+    const void* ptr = dt;
+    const ZSTD_seqSymbol_header* const DTableH = (const ZSTD_seqSymbol_header*)ptr;
+    DStatePtr->state = BIT_readBits(bitD, DTableH->tableLog);
+    DEBUGLOG(6, "ZSTD_initFseState : val=%u using %u bits",
+                (U32)DStatePtr->state, DTableH->tableLog);
+    BIT_reloadDStream(bitD);
+    DStatePtr->table = dt + 1;
+}
+
+FORCE_INLINE_TEMPLATE void
+ZSTD_updateFseState(ZSTD_fseState* DStatePtr, BIT_DStream_t* bitD)
+{
+    ZSTD_seqSymbol const DInfo = DStatePtr->table[DStatePtr->state];
+    U32 const nbBits = DInfo.nbBits;
+    size_t const lowBits = BIT_readBits(bitD, nbBits);
+    DStatePtr->state = DInfo.nextState + lowBits;
+}
+
+/* We need to add at most (ZSTD_WINDOWLOG_MAX_32 - 1) bits to read the maximum
+ * offset bits. But we can only read at most (STREAM_ACCUMULATOR_MIN_32 - 1)
+ * bits before reloading. This value is the maximum number of bytes we read
+ * after reloading when we are decoding long offets.
+ */
+#define LONG_OFFSETS_MAX_EXTRA_BITS_32                       \
+    (ZSTD_WINDOWLOG_MAX_32 > STREAM_ACCUMULATOR_MIN_32       \
+        ? ZSTD_WINDOWLOG_MAX_32 - STREAM_ACCUMULATOR_MIN_32  \
+        : 0)
+
+typedef enum { ZSTD_lo_isRegularOffset, ZSTD_lo_isLongOffset=1 } ZSTD_longOffset_e;
+
+#ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG
+FORCE_INLINE_TEMPLATE seq_t
+ZSTD_decodeSequence(seqState_t* seqState, const ZSTD_longOffset_e longOffsets)
+{
+    seq_t seq;
+    U32 const llBits = seqState->stateLL.table[seqState->stateLL.state].nbAdditionalBits;
+    U32 const mlBits = seqState->stateML.table[seqState->stateML.state].nbAdditionalBits;
+    U32 const ofBits = seqState->stateOffb.table[seqState->stateOffb.state].nbAdditionalBits;
+    U32 const totalBits = llBits+mlBits+ofBits;
+    U32 const llBase = seqState->stateLL.table[seqState->stateLL.state].baseValue;
+    U32 const mlBase = seqState->stateML.table[seqState->stateML.state].baseValue;
+    U32 const ofBase = seqState->stateOffb.table[seqState->stateOffb.state].baseValue;
+
+    /* sequence */
+    {   size_t offset;
+        if (!ofBits)
+            offset = 0;
+        else {
+            ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);
+            ZSTD_STATIC_ASSERT(LONG_OFFSETS_MAX_EXTRA_BITS_32 == 5);
+            assert(ofBits <= MaxOff);
+            if (MEM_32bits() && longOffsets && (ofBits >= STREAM_ACCUMULATOR_MIN_32)) {
+                U32 const extraBits = ofBits - MIN(ofBits, 32 - seqState->DStream.bitsConsumed);
+                offset = ofBase + (BIT_readBitsFast(&seqState->DStream, ofBits - extraBits) << extraBits);
+                BIT_reloadDStream(&seqState->DStream);
+                if (extraBits) offset += BIT_readBitsFast(&seqState->DStream, extraBits);
+                assert(extraBits <= LONG_OFFSETS_MAX_EXTRA_BITS_32);   /* to avoid another reload */
+            } else {
+                offset = ofBase + BIT_readBitsFast(&seqState->DStream, ofBits/*>0*/);   /* <=  (ZSTD_WINDOWLOG_MAX-1) bits */
+                if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream);
+            }
+        }
+
+        if (ofBits <= 1) {
+            offset += (llBase==0);
+            if (offset) {
+                size_t temp = (offset==3) ? seqState->prevOffset[0] - 1 : seqState->prevOffset[offset];
+                temp += !temp;   /* 0 is not valid; input is corrupted; force offset to 1 */
+                if (offset != 1) seqState->prevOffset[2] = seqState->prevOffset[1];
+                seqState->prevOffset[1] = seqState->prevOffset[0];
+                seqState->prevOffset[0] = offset = temp;
+            } else {  /* offset == 0 */
+                offset = seqState->prevOffset[0];
+            }
+        } else {
+            seqState->prevOffset[2] = seqState->prevOffset[1];
+            seqState->prevOffset[1] = seqState->prevOffset[0];
+            seqState->prevOffset[0] = offset;
+        }
+        seq.offset = offset;
+    }
+
+    seq.matchLength = mlBase
+                    + ((mlBits>0) ? BIT_readBitsFast(&seqState->DStream, mlBits/*>0*/) : 0);  /* <=  16 bits */
+    if (MEM_32bits() && (mlBits+llBits >= STREAM_ACCUMULATOR_MIN_32-LONG_OFFSETS_MAX_EXTRA_BITS_32))
+        BIT_reloadDStream(&seqState->DStream);
+    if (MEM_64bits() && (totalBits >= STREAM_ACCUMULATOR_MIN_64-(LLFSELog+MLFSELog+OffFSELog)))
+        BIT_reloadDStream(&seqState->DStream);
+    /* Ensure there are enough bits to read the rest of data in 64-bit mode. */
+    ZSTD_STATIC_ASSERT(16+LLFSELog+MLFSELog+OffFSELog < STREAM_ACCUMULATOR_MIN_64);
+
+    seq.litLength = llBase
+                  + ((llBits>0) ? BIT_readBitsFast(&seqState->DStream, llBits/*>0*/) : 0);    /* <=  16 bits */
+    if (MEM_32bits())
+        BIT_reloadDStream(&seqState->DStream);
+
+    DEBUGLOG(6, "seq: litL=%u, matchL=%u, offset=%u",
+                (U32)seq.litLength, (U32)seq.matchLength, (U32)seq.offset);
+
+    /* ANS state update */
+    ZSTD_updateFseState(&seqState->stateLL, &seqState->DStream);    /* <=  9 bits */
+    ZSTD_updateFseState(&seqState->stateML, &seqState->DStream);    /* <=  9 bits */
+    if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream);    /* <= 18 bits */
+    ZSTD_updateFseState(&seqState->stateOffb, &seqState->DStream);  /* <=  8 bits */
+
+    return seq;
+}
+
+FORCE_INLINE_TEMPLATE size_t
+ZSTD_decompressSequences_body( ZSTD_DCtx* dctx,
+                               void* dst, size_t maxDstSize,
+                         const void* seqStart, size_t seqSize, int nbSeq,
+                         const ZSTD_longOffset_e isLongOffset)
+{
+    const BYTE* ip = (const BYTE*)seqStart;
+    const BYTE* const iend = ip + seqSize;
+    BYTE* const ostart = (BYTE* const)dst;
+    BYTE* const oend = ostart + maxDstSize;
+    BYTE* op = ostart;
+    const BYTE* litPtr = dctx->litPtr;
+    const BYTE* const litEnd = litPtr + dctx->litSize;
+    const BYTE* const prefixStart = (const BYTE*) (dctx->prefixStart);
+    const BYTE* const vBase = (const BYTE*) (dctx->virtualStart);
+    const BYTE* const dictEnd = (const BYTE*) (dctx->dictEnd);
+    DEBUGLOG(5, "ZSTD_decompressSequences_body");
+
+    /* Regen sequences */
+    if (nbSeq) {
+        seqState_t seqState;
+        dctx->fseEntropy = 1;
+        { U32 i; for (i=0; i<ZSTD_REP_NUM; i++) seqState.prevOffset[i] = dctx->entropy.rep[i]; }
+        CHECK_E(BIT_initDStream(&seqState.DStream, ip, iend-ip), corruption_detected);
+        ZSTD_initFseState(&seqState.stateLL, &seqState.DStream, dctx->LLTptr);
+        ZSTD_initFseState(&seqState.stateOffb, &seqState.DStream, dctx->OFTptr);
+        ZSTD_initFseState(&seqState.stateML, &seqState.DStream, dctx->MLTptr);
+
+        for ( ; (BIT_reloadDStream(&(seqState.DStream)) <= BIT_DStream_completed) && nbSeq ; ) {
+            nbSeq--;
+            {   seq_t const sequence = ZSTD_decodeSequence(&seqState, isLongOffset);
+                size_t const oneSeqSize = ZSTD_execSequence(op, oend, sequence, &litPtr, litEnd, prefixStart, vBase, dictEnd);
+                DEBUGLOG(6, "regenerated sequence size : %u", (U32)oneSeqSize);
+                if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
+                op += oneSeqSize;
+        }   }
+
+        /* check if reached exact end */
+        DEBUGLOG(5, "ZSTD_decompressSequences_body: after decode loop, remaining nbSeq : %i", nbSeq);
+        if (nbSeq) return ERROR(corruption_detected);
+        /* save reps for next block */
+        { U32 i; for (i=0; i<ZSTD_REP_NUM; i++) dctx->entropy.rep[i] = (U32)(seqState.prevOffset[i]); }
+    }
+
+    /* last literal segment */
+    {   size_t const lastLLSize = litEnd - litPtr;
+        if (lastLLSize > (size_t)(oend-op)) return ERROR(dstSize_tooSmall);
+        memcpy(op, litPtr, lastLLSize);
+        op += lastLLSize;
+    }
+
+    return op-ostart;
+}
+
+static size_t
+ZSTD_decompressSequences_default(ZSTD_DCtx* dctx,
+                                 void* dst, size_t maxDstSize,
+                           const void* seqStart, size_t seqSize, int nbSeq,
+                           const ZSTD_longOffset_e isLongOffset)
+{
+    return ZSTD_decompressSequences_body(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
+}
+#endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG */
+
+
+
+#ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT
+FORCE_INLINE_TEMPLATE seq_t
+ZSTD_decodeSequenceLong(seqState_t* seqState, ZSTD_longOffset_e const longOffsets)
+{
+    seq_t seq;
+    U32 const llBits = seqState->stateLL.table[seqState->stateLL.state].nbAdditionalBits;
+    U32 const mlBits = seqState->stateML.table[seqState->stateML.state].nbAdditionalBits;
+    U32 const ofBits = seqState->stateOffb.table[seqState->stateOffb.state].nbAdditionalBits;
+    U32 const totalBits = llBits+mlBits+ofBits;
+    U32 const llBase = seqState->stateLL.table[seqState->stateLL.state].baseValue;
+    U32 const mlBase = seqState->stateML.table[seqState->stateML.state].baseValue;
+    U32 const ofBase = seqState->stateOffb.table[seqState->stateOffb.state].baseValue;
+
+    /* sequence */
+    {   size_t offset;
+        if (!ofBits)
+            offset = 0;
+        else {
+            ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);
+            ZSTD_STATIC_ASSERT(LONG_OFFSETS_MAX_EXTRA_BITS_32 == 5);
+            assert(ofBits <= MaxOff);
+            if (MEM_32bits() && longOffsets) {
+                U32 const extraBits = ofBits - MIN(ofBits, STREAM_ACCUMULATOR_MIN_32-1);
+                offset = ofBase + (BIT_readBitsFast(&seqState->DStream, ofBits - extraBits) << extraBits);
+                if (MEM_32bits() || extraBits) BIT_reloadDStream(&seqState->DStream);
+                if (extraBits) offset += BIT_readBitsFast(&seqState->DStream, extraBits);
+            } else {
+                offset = ofBase + BIT_readBitsFast(&seqState->DStream, ofBits);   /* <=  (ZSTD_WINDOWLOG_MAX-1) bits */
+                if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream);
+            }
+        }
+
+        if (ofBits <= 1) {
+            offset += (llBase==0);
+            if (offset) {
+                size_t temp = (offset==3) ? seqState->prevOffset[0] - 1 : seqState->prevOffset[offset];
+                temp += !temp;   /* 0 is not valid; input is corrupted; force offset to 1 */
+                if (offset != 1) seqState->prevOffset[2] = seqState->prevOffset[1];
+                seqState->prevOffset[1] = seqState->prevOffset[0];
+                seqState->prevOffset[0] = offset = temp;
+            } else {
+                offset = seqState->prevOffset[0];
+            }
+        } else {
+            seqState->prevOffset[2] = seqState->prevOffset[1];
+            seqState->prevOffset[1] = seqState->prevOffset[0];
+            seqState->prevOffset[0] = offset;
+        }
+        seq.offset = offset;
+    }
+
+    seq.matchLength = mlBase + ((mlBits>0) ? BIT_readBitsFast(&seqState->DStream, mlBits) : 0);  /* <=  16 bits */
+    if (MEM_32bits() && (mlBits+llBits >= STREAM_ACCUMULATOR_MIN_32-LONG_OFFSETS_MAX_EXTRA_BITS_32))
+        BIT_reloadDStream(&seqState->DStream);
+    if (MEM_64bits() && (totalBits >= STREAM_ACCUMULATOR_MIN_64-(LLFSELog+MLFSELog+OffFSELog)))
+        BIT_reloadDStream(&seqState->DStream);
+    /* Verify that there is enough bits to read the rest of the data in 64-bit mode. */
+    ZSTD_STATIC_ASSERT(16+LLFSELog+MLFSELog+OffFSELog < STREAM_ACCUMULATOR_MIN_64);
+
+    seq.litLength = llBase + ((llBits>0) ? BIT_readBitsFast(&seqState->DStream, llBits) : 0);    /* <=  16 bits */
+    if (MEM_32bits())
+        BIT_reloadDStream(&seqState->DStream);
+
+    {   size_t const pos = seqState->pos + seq.litLength;
+        const BYTE* const matchBase = (seq.offset > pos) ? seqState->dictEnd : seqState->prefixStart;
+        seq.match = matchBase + pos - seq.offset;  /* note : this operation can overflow when seq.offset is really too large, which can only happen when input is corrupted.
+                                                    * No consequence though : no memory access will occur, overly large offset will be detected in ZSTD_execSequenceLong() */
+        seqState->pos = pos + seq.matchLength;
+    }
+
+    /* ANS state update */
+    ZSTD_updateFseState(&seqState->stateLL, &seqState->DStream);    /* <=  9 bits */
+    ZSTD_updateFseState(&seqState->stateML, &seqState->DStream);    /* <=  9 bits */
+    if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream);    /* <= 18 bits */
+    ZSTD_updateFseState(&seqState->stateOffb, &seqState->DStream);  /* <=  8 bits */
+
+    return seq;
+}
+
+FORCE_INLINE_TEMPLATE size_t
+ZSTD_decompressSequencesLong_body(
+                               ZSTD_DCtx* dctx,
+                               void* dst, size_t maxDstSize,
+                         const void* seqStart, size_t seqSize, int nbSeq,
+                         const ZSTD_longOffset_e isLongOffset)
+{
+    const BYTE* ip = (const BYTE*)seqStart;
+    const BYTE* const iend = ip + seqSize;
+    BYTE* const ostart = (BYTE* const)dst;
+    BYTE* const oend = ostart + maxDstSize;
+    BYTE* op = ostart;
+    const BYTE* litPtr = dctx->litPtr;
+    const BYTE* const litEnd = litPtr + dctx->litSize;
+    const BYTE* const prefixStart = (const BYTE*) (dctx->prefixStart);
+    const BYTE* const dictStart = (const BYTE*) (dctx->virtualStart);
+    const BYTE* const dictEnd = (const BYTE*) (dctx->dictEnd);
+
+    /* Regen sequences */
+    if (nbSeq) {
+#define STORED_SEQS 4
+#define STORED_SEQS_MASK (STORED_SEQS-1)
+#define ADVANCED_SEQS 4
+        seq_t sequences[STORED_SEQS];
+        int const seqAdvance = MIN(nbSeq, ADVANCED_SEQS);
+        seqState_t seqState;
+        int seqNb;
+        dctx->fseEntropy = 1;
+        { int i; for (i=0; i<ZSTD_REP_NUM; i++) seqState.prevOffset[i] = dctx->entropy.rep[i]; }
+        seqState.prefixStart = prefixStart;
+        seqState.pos = (size_t)(op-prefixStart);
+        seqState.dictEnd = dictEnd;
+        assert(iend >= ip);
+        CHECK_E(BIT_initDStream(&seqState.DStream, ip, iend-ip), corruption_detected);
+        ZSTD_initFseState(&seqState.stateLL, &seqState.DStream, dctx->LLTptr);
+        ZSTD_initFseState(&seqState.stateOffb, &seqState.DStream, dctx->OFTptr);
+        ZSTD_initFseState(&seqState.stateML, &seqState.DStream, dctx->MLTptr);
+
+        /* prepare in advance */
+        for (seqNb=0; (BIT_reloadDStream(&seqState.DStream) <= BIT_DStream_completed) && (seqNb<seqAdvance); seqNb++) {
+            sequences[seqNb] = ZSTD_decodeSequenceLong(&seqState, isLongOffset);
+            PREFETCH_L1(sequences[seqNb].match); PREFETCH_L1(sequences[seqNb].match + sequences[seqNb].matchLength - 1); /* note : it's safe to invoke PREFETCH() on any memory address, including invalid ones */
+        }
+        if (seqNb<seqAdvance) return ERROR(corruption_detected);
+
+        /* decode and decompress */
+        for ( ; (BIT_reloadDStream(&(seqState.DStream)) <= BIT_DStream_completed) && (seqNb<nbSeq) ; seqNb++) {
+            seq_t const sequence = ZSTD_decodeSequenceLong(&seqState, isLongOffset);
+            size_t const oneSeqSize = ZSTD_execSequenceLong(op, oend, sequences[(seqNb-ADVANCED_SEQS) & STORED_SEQS_MASK], &litPtr, litEnd, prefixStart, dictStart, dictEnd);
+            if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
+            PREFETCH_L1(sequence.match); PREFETCH_L1(sequence.match + sequence.matchLength - 1); /* note : it's safe to invoke PREFETCH() on any memory address, including invalid ones */
+            sequences[seqNb & STORED_SEQS_MASK] = sequence;
+            op += oneSeqSize;
+        }
+        if (seqNb<nbSeq) return ERROR(corruption_detected);
+
+        /* finish queue */
+        seqNb -= seqAdvance;
+        for ( ; seqNb<nbSeq ; seqNb++) {
+            size_t const oneSeqSize = ZSTD_execSequenceLong(op, oend, sequences[seqNb&STORED_SEQS_MASK], &litPtr, litEnd, prefixStart, dictStart, dictEnd);
+            if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
+            op += oneSeqSize;
+        }
+
+        /* save reps for next block */
+        { U32 i; for (i=0; i<ZSTD_REP_NUM; i++) dctx->entropy.rep[i] = (U32)(seqState.prevOffset[i]); }
+    }
+
+    /* last literal segment */
+    {   size_t const lastLLSize = litEnd - litPtr;
+        if (lastLLSize > (size_t)(oend-op)) return ERROR(dstSize_tooSmall);
+        memcpy(op, litPtr, lastLLSize);
+        op += lastLLSize;
+    }
+
+    return op-ostart;
+}
+
+static size_t
+ZSTD_decompressSequencesLong_default(ZSTD_DCtx* dctx,
+                                 void* dst, size_t maxDstSize,
+                           const void* seqStart, size_t seqSize, int nbSeq,
+                           const ZSTD_longOffset_e isLongOffset)
+{
+    return ZSTD_decompressSequencesLong_body(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
+}
+#endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT */
+
+
+
+#if DYNAMIC_BMI2
+
+#ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG
+static TARGET_ATTRIBUTE("bmi2") size_t
+ZSTD_decompressSequences_bmi2(ZSTD_DCtx* dctx,
+                                 void* dst, size_t maxDstSize,
+                           const void* seqStart, size_t seqSize, int nbSeq,
+                           const ZSTD_longOffset_e isLongOffset)
+{
+    return ZSTD_decompressSequences_body(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
+}
+#endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG */
+
+#ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT
+static TARGET_ATTRIBUTE("bmi2") size_t
+ZSTD_decompressSequencesLong_bmi2(ZSTD_DCtx* dctx,
+                                 void* dst, size_t maxDstSize,
+                           const void* seqStart, size_t seqSize, int nbSeq,
+                           const ZSTD_longOffset_e isLongOffset)
+{
+    return ZSTD_decompressSequencesLong_body(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
+}
+#endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT */
+
+#endif /* DYNAMIC_BMI2 */
+
+typedef size_t (*ZSTD_decompressSequences_t)(
+                            ZSTD_DCtx* dctx,
+                            void* dst, size_t maxDstSize,
+                            const void* seqStart, size_t seqSize, int nbSeq,
+                            const ZSTD_longOffset_e isLongOffset);
+
+#ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG
+static size_t
+ZSTD_decompressSequences(ZSTD_DCtx* dctx, void* dst, size_t maxDstSize,
+                   const void* seqStart, size_t seqSize, int nbSeq,
+                   const ZSTD_longOffset_e isLongOffset)
+{
+    DEBUGLOG(5, "ZSTD_decompressSequences");
+#if DYNAMIC_BMI2
+    if (dctx->bmi2) {
+        return ZSTD_decompressSequences_bmi2(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
+    }
+#endif
+  return ZSTD_decompressSequences_default(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
+}
+#endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG */
+
+
+#ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT
+/* ZSTD_decompressSequencesLong() :
+ * decompression function triggered when a minimum share of offsets is considered "long",
+ * aka out of cache.
+ * note : "long" definition seems overloaded here, sometimes meaning "wider than bitstream register", and sometimes mearning "farther than memory cache distance".
+ * This function will try to mitigate main memory latency through the use of prefetching */
+static size_t
+ZSTD_decompressSequencesLong(ZSTD_DCtx* dctx,
+                             void* dst, size_t maxDstSize,
+                             const void* seqStart, size_t seqSize, int nbSeq,
+                             const ZSTD_longOffset_e isLongOffset)
+{
+    DEBUGLOG(5, "ZSTD_decompressSequencesLong");
+#if DYNAMIC_BMI2
+    if (dctx->bmi2) {
+        return ZSTD_decompressSequencesLong_bmi2(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
+    }
+#endif
+  return ZSTD_decompressSequencesLong_default(dctx, dst, maxDstSize, seqStart, seqSize, nbSeq, isLongOffset);
+}
+#endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT */
+
+
+
+#if !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT) && \
+    !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG)
+/* ZSTD_getLongOffsetsShare() :
+ * condition : offTable must be valid
+ * @return : "share" of long offsets (arbitrarily defined as > (1<<23))
+ *           compared to maximum possible of (1<<OffFSELog) */
+static unsigned
+ZSTD_getLongOffsetsShare(const ZSTD_seqSymbol* offTable)
+{
+    const void* ptr = offTable;
+    U32 const tableLog = ((const ZSTD_seqSymbol_header*)ptr)[0].tableLog;
+    const ZSTD_seqSymbol* table = offTable + 1;
+    U32 const max = 1 << tableLog;
+    U32 u, total = 0;
+    DEBUGLOG(5, "ZSTD_getLongOffsetsShare: (tableLog=%u)", tableLog);
+
+    assert(max <= (1 << OffFSELog));  /* max not too large */
+    for (u=0; u<max; u++) {
+        if (table[u].nbAdditionalBits > 22) total += 1;
+    }
+
+    assert(tableLog <= OffFSELog);
+    total <<= (OffFSELog - tableLog);  /* scale to OffFSELog */
+
+    return total;
+}
+#endif
+
+
+size_t
+ZSTD_decompressBlock_internal(ZSTD_DCtx* dctx,
+                              void* dst, size_t dstCapacity,
+                        const void* src, size_t srcSize, const int frame)
+{   /* blockType == blockCompressed */
+    const BYTE* ip = (const BYTE*)src;
+    /* isLongOffset must be true if there are long offsets.
+     * Offsets are long if they are larger than 2^STREAM_ACCUMULATOR_MIN.
+     * We don't expect that to be the case in 64-bit mode.
+     * In block mode, window size is not known, so we have to be conservative.
+     * (note: but it could be evaluated from current-lowLimit)
+     */
+    ZSTD_longOffset_e const isLongOffset = (ZSTD_longOffset_e)(MEM_32bits() && (!frame || (dctx->fParams.windowSize > (1ULL << STREAM_ACCUMULATOR_MIN))));
+    DEBUGLOG(5, "ZSTD_decompressBlock_internal (size : %u)", (U32)srcSize);
+
+    if (srcSize >= ZSTD_BLOCKSIZE_MAX) return ERROR(srcSize_wrong);
+
+    /* Decode literals section */
+    {   size_t const litCSize = ZSTD_decodeLiteralsBlock(dctx, src, srcSize);
+        DEBUGLOG(5, "ZSTD_decodeLiteralsBlock : %u", (U32)litCSize);
+        if (ZSTD_isError(litCSize)) return litCSize;
+        ip += litCSize;
+        srcSize -= litCSize;
+    }
+
+    /* Build Decoding Tables */
+    {
+        /* These macros control at build-time which decompressor implementation
+         * we use. If neither is defined, we do some inspection and dispatch at
+         * runtime.
+         */
+#if !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT) && \
+    !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG)
+        int usePrefetchDecoder = dctx->ddictIsCold;
+#endif
+        int nbSeq;
+        size_t const seqHSize = ZSTD_decodeSeqHeaders(dctx, &nbSeq, ip, srcSize);
+        if (ZSTD_isError(seqHSize)) return seqHSize;
+        ip += seqHSize;
+        srcSize -= seqHSize;
+
+#if !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT) && \
+    !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG)
+        if ( !usePrefetchDecoder
+          && (!frame || (dctx->fParams.windowSize > (1<<24)))
+          && (nbSeq>ADVANCED_SEQS) ) {  /* could probably use a larger nbSeq limit */
+            U32 const shareLongOffsets = ZSTD_getLongOffsetsShare(dctx->OFTptr);
+            U32 const minShare = MEM_64bits() ? 7 : 20; /* heuristic values, correspond to 2.73% and 7.81% */
+            usePrefetchDecoder = (shareLongOffsets >= minShare);
+        }
+#endif
+
+        dctx->ddictIsCold = 0;
+
+#if !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT) && \
+    !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG)
+        if (usePrefetchDecoder)
+#endif
+#ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT
+            return ZSTD_decompressSequencesLong(dctx, dst, dstCapacity, ip, srcSize, nbSeq, isLongOffset);
+#endif
+
+#ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG
+        /* else */
+        return ZSTD_decompressSequences(dctx, dst, dstCapacity, ip, srcSize, nbSeq, isLongOffset);
+#endif
+    }
+}
+
+
+size_t ZSTD_decompressBlock(ZSTD_DCtx* dctx,
+                            void* dst, size_t dstCapacity,
+                      const void* src, size_t srcSize)
+{
+    size_t dSize;
+    ZSTD_checkContinuity(dctx, dst);
+    dSize = ZSTD_decompressBlock_internal(dctx, dst, dstCapacity, src, srcSize, /* frame */ 0);
+    dctx->previousDstEnd = (char*)dst + dSize;
+    return dSize;
+}
diff --git a/contrib/python-zstandard/zstd/decompress/zstd_decompress_internal.h b/contrib/python-zstandard/zstd/decompress/zstd_decompress_internal.h
new file mode 100644
--- /dev/null
+++ b/contrib/python-zstandard/zstd/decompress/zstd_decompress_internal.h
@@ -0,0 +1,168 @@
+/*
+ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
+ * All rights reserved.
+ *
+ * This source code is licensed under both the BSD-style license (found in the
+ * LICENSE file in the root directory of this source tree) and the GPLv2 (found
+ * in the COPYING file in the root directory of this source tree).
+ * You may select, at your option, one of the above-listed licenses.
+ */
+
+
+/* zstd_decompress_internal:
+ * objects and definitions shared within lib/decompress modules */
+
+ #ifndef ZSTD_DECOMPRESS_INTERNAL_H
+ #define ZSTD_DECOMPRESS_INTERNAL_H
+
+
+/*-*******************************************************
+ *  Dependencies
+ *********************************************************/
+#include "mem.h"             /* BYTE, U16, U32 */
+#include "zstd_internal.h"   /* ZSTD_seqSymbol */
+
+
+
+/*-*******************************************************
+ *  Constants
+ *********************************************************/
+static const U32 LL_base[MaxLL+1] = {
+                 0,    1,    2,     3,     4,     5,     6,      7,
+                 8,    9,   10,    11,    12,    13,    14,     15,
+                16,   18,   20,    22,    24,    28,    32,     40,
+                48,   64, 0x80, 0x100, 0x200, 0x400, 0x800, 0x1000,
+                0x2000, 0x4000, 0x8000, 0x10000 };
+
+static const U32 OF_base[MaxOff+1] = {
+                 0,        1,       1,       5,     0xD,     0x1D,     0x3D,     0x7D,
+                 0xFD,   0x1FD,   0x3FD,   0x7FD,   0xFFD,   0x1FFD,   0x3FFD,   0x7FFD,
+                 0xFFFD, 0x1FFFD, 0x3FFFD, 0x7FFFD, 0xFFFFD, 0x1FFFFD, 0x3FFFFD, 0x7FFFFD,
+                 0xFFFFFD, 0x1FFFFFD, 0x3FFFFFD, 0x7FFFFFD, 0xFFFFFFD, 0x1FFFFFFD, 0x3FFFFFFD, 0x7FFFFFFD };
+
+static const U32 OF_bits[MaxOff+1] = {
+                     0,  1,  2,  3,  4,  5,  6,  7,
+                     8,  9, 10, 11, 12, 13, 14, 15,
+                    16, 17, 18, 19, 20, 21, 22, 23,
+                    24, 25, 26, 27, 28, 29, 30, 31 };
+
+static const U32 ML_base[MaxML+1] = {
+                     3,  4,  5,    6,     7,     8,     9,    10,
+                    11, 12, 13,   14,    15,    16,    17,    18,
+                    19, 20, 21,   22,    23,    24,    25,    26,
+                    27, 28, 29,   30,    31,    32,    33,    34,
+                    35, 37, 39,   41,    43,    47,    51,    59,
+                    67, 83, 99, 0x83, 0x103, 0x203, 0x403, 0x803,
+                    0x1003, 0x2003, 0x4003, 0x8003, 0x10003 };
+
+
+/*-*******************************************************
+ *  Decompression types
+ *********************************************************/
+ typedef struct {
+     U32 fastMode;
+     U32 tableLog;
+ } ZSTD_seqSymbol_header;
+
+ typedef struct {
+     U16  nextState;
+     BYTE nbAdditionalBits;
+     BYTE nbBits;
+     U32  baseValue;
+ } ZSTD_seqSymbol;
+
+ #define SEQSYMBOL_TABLE_SIZE(log)   (1 + (1 << (log)))
+
+typedef struct {
+    ZSTD_seqSymbol LLTable[SEQSYMBOL_TABLE_SIZE(LLFSELog)];    /* Note : Space reserved for FSE Tables */
+    ZSTD_seqSymbol OFTable[SEQSYMBOL_TABLE_SIZE(OffFSELog)];   /* is also used as temporary workspace while building hufTable during DDict creation */
+    ZSTD_seqSymbol MLTable[SEQSYMBOL_TABLE_SIZE(MLFSELog)];    /* and therefore must be at least HUF_DECOMPRESS_WORKSPACE_SIZE large */
+    HUF_DTable hufTable[HUF_DTABLE_SIZE(HufLog)];  /* can accommodate HUF_decompress4X */
+    U32 rep[ZSTD_REP_NUM];
+} ZSTD_entropyDTables_t;
+
+typedef enum { ZSTDds_getFrameHeaderSize, ZSTDds_decodeFrameHeader,
+               ZSTDds_decodeBlockHeader, ZSTDds_decompressBlock,
+               ZSTDds_decompressLastBlock, ZSTDds_checkChecksum,
+               ZSTDds_decodeSkippableHeader, ZSTDds_skipFrame } ZSTD_dStage;
+
+typedef enum { zdss_init=0, zdss_loadHeader,
+               zdss_read, zdss_load, zdss_flush } ZSTD_dStreamStage;
+
+struct ZSTD_DCtx_s
+{
+    const ZSTD_seqSymbol* LLTptr;
+    const ZSTD_seqSymbol* MLTptr;
+    const ZSTD_seqSymbol* OFTptr;
+    const HUF_DTable* HUFptr;
+    ZSTD_entropyDTables_t entropy;
+    U32 workspace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32];   /* space needed when building huffman tables */
+    const void* previousDstEnd;   /* detect continuity */
+    const void* prefixStart;      /* start of current segment */
+    const void* virtualStart;     /* virtual start of previous segment if it was just before current one */
+    const void* dictEnd;          /* end of previous segment */
+    size_t expected;
+    ZSTD_frameHeader fParams;
+    U64 decodedSize;
+    blockType_e bType;            /* used in ZSTD_decompressContinue(), store blockType between block header decoding and block decompression stages */
+    ZSTD_dStage stage;
+    U32 litEntropy;
+    U32 fseEntropy;
+    XXH64_state_t xxhState;
+    size_t headerSize;
+    ZSTD_format_e format;
+    const BYTE* litPtr;
+    ZSTD_customMem customMem;
+    size_t litSize;
+    size_t rleSize;
+    size_t staticSize;
+    int bmi2;                     /* == 1 if the CPU supports BMI2 and 0 otherwise. CPU support is determined dynamically once per context lifetime. */
+
+    /* dictionary */
+    ZSTD_DDict* ddictLocal;
+    const ZSTD_DDict* ddict;     /* set by ZSTD_initDStream_usingDDict(), or ZSTD_DCtx_refDDict() */
+    U32 dictID;
+    int ddictIsCold;             /* if == 1 : dictionary is "new" for working context, and presumed "cold" (not in cpu cache) */
+
+    /* streaming */
+    ZSTD_dStreamStage streamStage;
+    char*  inBuff;
+    size_t inBuffSize;
+    size_t inPos;
+    size_t maxWindowSize;
+    char*  outBuff;
+    size_t outBuffSize;
+    size_t outStart;
+    size_t outEnd;
+    size_t lhSize;
+    void* legacyContext;
+    U32 previousLegacyVersion;
+    U32 legacyVersion;
+    U32 hostageByte;
+    int noForwardProgress;
+
+    /* workspace */
+    BYTE litBuffer[ZSTD_BLOCKSIZE_MAX + WILDCOPY_OVERLENGTH];
+    BYTE headerBuffer[ZSTD_FRAMEHEADERSIZE_MAX];
+};  /* typedef'd to ZSTD_DCtx within "zstd.h" */
+
+
+/*-*******************************************************
+ *  Shared internal functions
+ *********************************************************/
+
+/*! ZSTD_loadDEntropy() :
+ *  dict : must point at beginning of a valid zstd dictionary.
+ * @return : size of entropy tables read */
+size_t ZSTD_loadDEntropy(ZSTD_entropyDTables_t* entropy,
+                   const void* const dict, size_t const dictSize);
+
+/*! ZSTD_checkContinuity() :
+ *  check if next `dst` follows previous position, where decompression ended.
+ *  If yes, do nothing (continue on current segment).
+ *  If not, classify previous segment as "external dictionary", and start a new segment.
+ *  This function cannot fail. */
+void ZSTD_checkContinuity(ZSTD_DCtx* dctx, const void* dst);
+
+
+#endif /* ZSTD_DECOMPRESS_INTERNAL_H */
diff --git a/contrib/python-zstandard/zstd/dictBuilder/cover.c b/contrib/python-zstandard/zstd/dictBuilder/cover.c
--- a/contrib/python-zstandard/zstd/dictBuilder/cover.c
+++ b/contrib/python-zstandard/zstd/dictBuilder/cover.c
@@ -39,7 +39,7 @@
 /*-*************************************
 *  Constants
 ***************************************/
-#define COVER_MAX_SAMPLES_SIZE (sizeof(size_t) == 8 ? ((U32)-1) : ((U32)1 GB))
+#define COVER_MAX_SAMPLES_SIZE (sizeof(size_t) == 8 ? ((unsigned)-1) : ((unsigned)1 GB))
 #define DEFAULT_SPLITPOINT 1.0
 
 /*-*************************************
@@ -543,7 +543,7 @@
   if (totalSamplesSize < MAX(d, sizeof(U64)) ||
       totalSamplesSize >= (size_t)COVER_MAX_SAMPLES_SIZE) {
     DISPLAYLEVEL(1, "Total samples size is too large (%u MB), maximum size is %u MB\n",
-                 (U32)(totalSamplesSize>>20), (COVER_MAX_SAMPLES_SIZE >> 20));
+                 (unsigned)(totalSamplesSize>>20), (COVER_MAX_SAMPLES_SIZE >> 20));
     return 0;
   }
   /* Check if there are at least 5 training samples */
@@ -559,9 +559,9 @@
   /* Zero the context */
   memset(ctx, 0, sizeof(*ctx));
   DISPLAYLEVEL(2, "Training on %u samples of total size %u\n", nbTrainSamples,
-               (U32)trainingSamplesSize);
+               (unsigned)trainingSamplesSize);
   DISPLAYLEVEL(2, "Testing on %u samples of total size %u\n", nbTestSamples,
-               (U32)testSamplesSize);
+               (unsigned)testSamplesSize);
   ctx->samples = samples;
   ctx->samplesSizes = samplesSizes;
   ctx->nbSamples = nbSamples;
@@ -639,11 +639,11 @@
   /* Divide the data up into epochs of equal size.
    * We will select at least one segment from each epoch.
    */
-  const U32 epochs = MAX(1, (U32)(dictBufferCapacity / parameters.k / 4));
-  const U32 epochSize = (U32)(ctx->suffixSize / epochs);
+  const unsigned epochs = MAX(1, (U32)(dictBufferCapacity / parameters.k / 4));
+  const unsigned epochSize = (U32)(ctx->suffixSize / epochs);
   size_t epoch;
-  DISPLAYLEVEL(2, "Breaking content into %u epochs of size %u\n", epochs,
-               epochSize);
+  DISPLAYLEVEL(2, "Breaking content into %u epochs of size %u\n",
+                epochs, epochSize);
   /* Loop through the epochs until there are no more segments or the dictionary
    * is full.
    */
@@ -670,7 +670,7 @@
     memcpy(dict + tail, ctx->samples + segment.begin, segmentSize);
     DISPLAYUPDATE(
         2, "\r%u%%       ",
-        (U32)(((dictBufferCapacity - tail) * 100) / dictBufferCapacity));
+        (unsigned)(((dictBufferCapacity - tail) * 100) / dictBufferCapacity));
   }
   DISPLAYLEVEL(2, "\r%79s\r", "");
   return tail;
@@ -722,7 +722,7 @@
         samplesBuffer, samplesSizes, nbSamples, parameters.zParams);
     if (!ZSTD_isError(dictionarySize)) {
       DISPLAYLEVEL(2, "Constructed dictionary of size %u\n",
-                   (U32)dictionarySize);
+                   (unsigned)dictionarySize);
     }
     COVER_ctx_destroy(&ctx);
     COVER_map_destroy(&activeDmers);
@@ -868,6 +868,8 @@
         if (!best->dict) {
           best->compressedSize = ERROR(GENERIC);
           best->dictSize = 0;
+          ZSTD_pthread_cond_signal(&best->cond);
+          ZSTD_pthread_mutex_unlock(&best->mutex);
           return;
         }
       }
@@ -1054,7 +1056,7 @@
       }
       /* Print status */
       LOCALDISPLAYUPDATE(displayLevel, 2, "\r%u%%       ",
-                         (U32)((iteration * 100) / kIterations));
+                         (unsigned)((iteration * 100) / kIterations));
       ++iteration;
     }
     COVER_best_wait(&best);
diff --git a/contrib/python-zstandard/zstd/dictBuilder/fastcover.c b/contrib/python-zstandard/zstd/dictBuilder/fastcover.c
--- a/contrib/python-zstandard/zstd/dictBuilder/fastcover.c
+++ b/contrib/python-zstandard/zstd/dictBuilder/fastcover.c
@@ -20,7 +20,7 @@
 /*-*************************************
 *  Constants
 ***************************************/
-#define FASTCOVER_MAX_SAMPLES_SIZE (sizeof(size_t) == 8 ? ((U32)-1) : ((U32)1 GB))
+#define FASTCOVER_MAX_SAMPLES_SIZE (sizeof(size_t) == 8 ? ((unsigned)-1) : ((unsigned)1 GB))
 #define FASTCOVER_MAX_F 31
 #define FASTCOVER_MAX_ACCEL 10
 #define DEFAULT_SPLITPOINT 0.75
@@ -159,15 +159,15 @@
    */
   while (activeSegment.end < end) {
     /* Get hash value of current dmer */
-    const size_t index = FASTCOVER_hashPtrToIndex(ctx->samples + activeSegment.end, f, d);
+    const size_t idx = FASTCOVER_hashPtrToIndex(ctx->samples + activeSegment.end, f, d);
 
     /* Add frequency of this index to score if this is the first occurence of index in active segment */
-    if (segmentFreqs[index] == 0) {
-      activeSegment.score += freqs[index];
+    if (segmentFreqs[idx] == 0) {
+      activeSegment.score += freqs[idx];
     }
     /* Increment end of segment and segmentFreqs*/
     activeSegment.end += 1;
-    segmentFreqs[index] += 1;
+    segmentFreqs[idx] += 1;
     /* If the window is now too large, drop the first position */
     if (activeSegment.end - activeSegment.begin == dmersInK + 1) {
       /* Get hash value of the dmer to be eliminated from active segment */
@@ -309,7 +309,7 @@
     if (totalSamplesSize < MAX(d, sizeof(U64)) ||
         totalSamplesSize >= (size_t)FASTCOVER_MAX_SAMPLES_SIZE) {
         DISPLAYLEVEL(1, "Total samples size is too large (%u MB), maximum size is %u MB\n",
-                    (U32)(totalSamplesSize >> 20), (FASTCOVER_MAX_SAMPLES_SIZE >> 20));
+                    (unsigned)(totalSamplesSize >> 20), (FASTCOVER_MAX_SAMPLES_SIZE >> 20));
         return 0;
     }
 
@@ -328,9 +328,9 @@
     /* Zero the context */
     memset(ctx, 0, sizeof(*ctx));
     DISPLAYLEVEL(2, "Training on %u samples of total size %u\n", nbTrainSamples,
-                    (U32)trainingSamplesSize);
+                    (unsigned)trainingSamplesSize);
     DISPLAYLEVEL(2, "Testing on %u samples of total size %u\n", nbTestSamples,
-                    (U32)testSamplesSize);
+                    (unsigned)testSamplesSize);
 
     ctx->samples = samples;
     ctx->samplesSizes = samplesSizes;
@@ -389,11 +389,11 @@
   /* Divide the data up into epochs of equal size.
    * We will select at least one segment from each epoch.
    */
-  const U32 epochs = MAX(1, (U32)(dictBufferCapacity / parameters.k));
-  const U32 epochSize = (U32)(ctx->nbDmers / epochs);
+  const unsigned epochs = MAX(1, (U32)(dictBufferCapacity / parameters.k));
+  const unsigned epochSize = (U32)(ctx->nbDmers / epochs);
   size_t epoch;
-  DISPLAYLEVEL(2, "Breaking content into %u epochs of size %u\n", epochs,
-               epochSize);
+  DISPLAYLEVEL(2, "Breaking content into %u epochs of size %u\n",
+                epochs, epochSize);
   /* Loop through the epochs until there are no more segments or the dictionary
    * is full.
    */
@@ -423,7 +423,7 @@
     memcpy(dict + tail, ctx->samples + segment.begin, segmentSize);
     DISPLAYUPDATE(
         2, "\r%u%%       ",
-        (U32)(((dictBufferCapacity - tail) * 100) / dictBufferCapacity));
+        (unsigned)(((dictBufferCapacity - tail) * 100) / dictBufferCapacity));
   }
   DISPLAYLEVEL(2, "\r%79s\r", "");
   return tail;
@@ -577,7 +577,7 @@
           samplesBuffer, samplesSizes, nbFinalizeSamples, coverParams.zParams);
       if (!ZSTD_isError(dictionarySize)) {
           DISPLAYLEVEL(2, "Constructed dictionary of size %u\n",
-                      (U32)dictionarySize);
+                      (unsigned)dictionarySize);
       }
       FASTCOVER_ctx_destroy(&ctx);
       free(segmentFreqs);
@@ -702,7 +702,7 @@
         }
         /* Print status */
         LOCALDISPLAYUPDATE(displayLevel, 2, "\r%u%%       ",
-                           (U32)((iteration * 100) / kIterations));
+                           (unsigned)((iteration * 100) / kIterations));
         ++iteration;
       }
       COVER_best_wait(&best);
diff --git a/contrib/python-zstandard/zstd/dictBuilder/zdict.c b/contrib/python-zstandard/zstd/dictBuilder/zdict.c
--- a/contrib/python-zstandard/zstd/dictBuilder/zdict.c
+++ b/contrib/python-zstandard/zstd/dictBuilder/zdict.c
@@ -255,15 +255,15 @@
     }
 
     {   int i;
-        U32 searchLength;
+        U32 mml;
         U32 refinedStart = start;
         U32 refinedEnd = end;
 
         DISPLAYLEVEL(4, "\n");
-        DISPLAYLEVEL(4, "found %3u matches of length >= %i at pos %7u  ", (U32)(end-start), MINMATCHLENGTH, (U32)pos);
+        DISPLAYLEVEL(4, "found %3u matches of length >= %i at pos %7u  ", (unsigned)(end-start), MINMATCHLENGTH, (unsigned)pos);
         DISPLAYLEVEL(4, "\n");
 
-        for (searchLength = MINMATCHLENGTH ; ; searchLength++) {
+        for (mml = MINMATCHLENGTH ; ; mml++) {
             BYTE currentChar = 0;
             U32 currentCount = 0;
             U32 currentID = refinedStart;
@@ -271,13 +271,13 @@
             U32 selectedCount = 0;
             U32 selectedID = currentID;
             for (id =refinedStart; id < refinedEnd; id++) {
-                if (b[suffix[id] + searchLength] != currentChar) {
+                if (b[suffix[id] + mml] != currentChar) {
                     if (currentCount > selectedCount) {
                         selectedCount = currentCount;
                         selectedID = currentID;
                     }
                     currentID = id;
-                    currentChar = b[ suffix[id] + searchLength];
+                    currentChar = b[ suffix[id] + mml];
                     currentCount = 0;
                 }
                 currentCount ++;
@@ -342,7 +342,7 @@
             savings[i] = savings[i-1] + (lengthList[i] * (i-3));
 
         DISPLAYLEVEL(4, "Selected dict at position %u, of length %u : saves %u (ratio: %.2f)  \n",
-                     (U32)pos, (U32)maxLength, savings[maxLength], (double)savings[maxLength] / maxLength);
+                     (unsigned)pos, (unsigned)maxLength, (unsigned)savings[maxLength], (double)savings[maxLength] / maxLength);
 
         solution.pos = (U32)pos;
         solution.length = (U32)maxLength;
@@ -497,7 +497,7 @@
 static size_t ZDICT_trainBuffer_legacy(dictItem* dictList, U32 dictListSize,
                             const void* const buffer, size_t bufferSize,   /* buffer must end with noisy guard band */
                             const size_t* fileSizes, unsigned nbFiles,
-                            U32 minRatio, U32 notificationLevel)
+                            unsigned minRatio, U32 notificationLevel)
 {
     int* const suffix0 = (int*)malloc((bufferSize+2)*sizeof(*suffix0));
     int* const suffix = suffix0+1;
@@ -523,11 +523,11 @@
     memset(doneMarks, 0, bufferSize+16);
 
     /* limit sample set size (divsufsort limitation)*/
-    if (bufferSize > ZDICT_MAX_SAMPLES_SIZE) DISPLAYLEVEL(3, "sample set too large : reduced to %u MB ...\n", (U32)(ZDICT_MAX_SAMPLES_SIZE>>20));
+    if (bufferSize > ZDICT_MAX_SAMPLES_SIZE) DISPLAYLEVEL(3, "sample set too large : reduced to %u MB ...\n", (unsigned)(ZDICT_MAX_SAMPLES_SIZE>>20));
     while (bufferSize > ZDICT_MAX_SAMPLES_SIZE) bufferSize -= fileSizes[--nbFiles];
 
     /* sort */
-    DISPLAYLEVEL(2, "sorting %u files of total size %u MB ...\n", nbFiles, (U32)(bufferSize>>20));
+    DISPLAYLEVEL(2, "sorting %u files of total size %u MB ...\n", nbFiles, (unsigned)(bufferSize>>20));
     {   int const divSuftSortResult = divsufsort((const unsigned char*)buffer, suffix, (int)bufferSize, 0);
         if (divSuftSortResult != 0) { result = ERROR(GENERIC); goto _cleanup; }
     }
@@ -589,7 +589,7 @@
 #define MAXREPOFFSET 1024
 
 static void ZDICT_countEStats(EStats_ress_t esr, ZSTD_parameters params,
-                              U32* countLit, U32* offsetcodeCount, U32* matchlengthCount, U32* litlengthCount, U32* repOffsets,
+                              unsigned* countLit, unsigned* offsetcodeCount, unsigned* matchlengthCount, unsigned* litlengthCount, U32* repOffsets,
                               const void* src, size_t srcSize,
                               U32 notificationLevel)
 {
@@ -602,7 +602,7 @@
 
     }
     cSize = ZSTD_compressBlock(esr.zc, esr.workPlace, ZSTD_BLOCKSIZE_MAX, src, srcSize);
-    if (ZSTD_isError(cSize)) { DISPLAYLEVEL(3, "warning : could not compress sample size %u \n", (U32)srcSize); return; }
+    if (ZSTD_isError(cSize)) { DISPLAYLEVEL(3, "warning : could not compress sample size %u \n", (unsigned)srcSize); return; }
 
     if (cSize) {  /* if == 0; block is not compressible */
         const seqStore_t* const seqStorePtr = ZSTD_getSeqStore(esr.zc);
@@ -671,7 +671,7 @@
  * rewrite `countLit` to contain a mostly flat but still compressible distribution of literals.
  * necessary to avoid generating a non-compressible distribution that HUF_writeCTable() cannot encode.
  */
-static void ZDICT_flatLit(U32* countLit)
+static void ZDICT_flatLit(unsigned* countLit)
 {
     int u;
     for (u=1; u<256; u++) countLit[u] = 2;
@@ -687,14 +687,14 @@
                              const void* dictBuffer, size_t  dictBufferSize,
                                    unsigned notificationLevel)
 {
-    U32 countLit[256];
+    unsigned countLit[256];
     HUF_CREATE_STATIC_CTABLE(hufTable, 255);
-    U32 offcodeCount[OFFCODE_MAX+1];
+    unsigned offcodeCount[OFFCODE_MAX+1];
     short offcodeNCount[OFFCODE_MAX+1];
     U32 offcodeMax = ZSTD_highbit32((U32)(dictBufferSize + 128 KB));
-    U32 matchLengthCount[MaxML+1];
+    unsigned matchLengthCount[MaxML+1];
     short matchLengthNCount[MaxML+1];
-    U32 litLengthCount[MaxLL+1];
+    unsigned litLengthCount[MaxLL+1];
     short litLengthNCount[MaxLL+1];
     U32 repOffset[MAXREPOFFSET];
     offsetCount_t bestRepOffset[ZSTD_REP_NUM+1];
@@ -983,33 +983,33 @@
 
     /* display best matches */
     if (params.zParams.notificationLevel>= 3) {
-        U32 const nb = MIN(25, dictList[0].pos);
-        U32 const dictContentSize = ZDICT_dictSize(dictList);
-        U32 u;
-        DISPLAYLEVEL(3, "\n %u segments found, of total size %u \n", dictList[0].pos-1, dictContentSize);
+        unsigned const nb = MIN(25, dictList[0].pos);
+        unsigned const dictContentSize = ZDICT_dictSize(dictList);
+        unsigned u;
+        DISPLAYLEVEL(3, "\n %u segments found, of total size %u \n", (unsigned)dictList[0].pos-1, dictContentSize);
         DISPLAYLEVEL(3, "list %u best segments \n", nb-1);
         for (u=1; u<nb; u++) {
-            U32 const pos = dictList[u].pos;
-            U32 const length = dictList[u].length;
+            unsigned const pos = dictList[u].pos;
+            unsigned const length = dictList[u].length;
             U32 const printedLength = MIN(40, length);
             if ((pos > samplesBuffSize) || ((pos + length) > samplesBuffSize)) {
                 free(dictList);
                 return ERROR(GENERIC);   /* should never happen */
             }
             DISPLAYLEVEL(3, "%3u:%3u bytes at pos %8u, savings %7u bytes |",
-                         u, length, pos, dictList[u].savings);
+                         u, length, pos, (unsigned)dictList[u].savings);
             ZDICT_printHex((const char*)samplesBuffer+pos, printedLength);
             DISPLAYLEVEL(3, "| \n");
     }   }
 
 
     /* create dictionary */
-    {   U32 dictContentSize = ZDICT_dictSize(dictList);
+    {   unsigned dictContentSize = ZDICT_dictSize(dictList);
         if (dictContentSize < ZDICT_CONTENTSIZE_MIN) { free(dictList); return ERROR(dictionaryCreation_failed); }   /* dictionary content too small */
         if (dictContentSize < targetDictSize/4) {
-            DISPLAYLEVEL(2, "!  warning : selected content significantly smaller than requested (%u < %u) \n", dictContentSize, (U32)maxDictSize);
+            DISPLAYLEVEL(2, "!  warning : selected content significantly smaller than requested (%u < %u) \n", dictContentSize, (unsigned)maxDictSize);
             if (samplesBuffSize < 10 * targetDictSize)
-                DISPLAYLEVEL(2, "!  consider increasing the number of samples (total size : %u MB)\n", (U32)(samplesBuffSize>>20));
+                DISPLAYLEVEL(2, "!  consider increasing the number of samples (total size : %u MB)\n", (unsigned)(samplesBuffSize>>20));
             if (minRep > MINRATIO) {
                 DISPLAYLEVEL(2, "!  consider increasing selectivity to produce larger dictionary (-s%u) \n", selectivity+1);
                 DISPLAYLEVEL(2, "!  note : larger dictionaries are not necessarily better, test its efficiency on samples \n");
@@ -1017,9 +1017,9 @@
         }
 
         if ((dictContentSize > targetDictSize*3) && (nbSamples > 2*MINRATIO) && (selectivity>1)) {
-            U32 proposedSelectivity = selectivity-1;
+            unsigned proposedSelectivity = selectivity-1;
             while ((nbSamples >> proposedSelectivity) <= MINRATIO) { proposedSelectivity--; }
-            DISPLAYLEVEL(2, "!  note : calculated dictionary significantly larger than requested (%u > %u) \n", dictContentSize, (U32)maxDictSize);
+            DISPLAYLEVEL(2, "!  note : calculated dictionary significantly larger than requested (%u > %u) \n", dictContentSize, (unsigned)maxDictSize);
             DISPLAYLEVEL(2, "!  consider increasing dictionary size, or produce denser dictionary (-s%u) \n", proposedSelectivity);
             DISPLAYLEVEL(2, "!  always test dictionary efficiency on real samples \n");
         }
diff --git a/contrib/python-zstandard/zstd/zstd.h b/contrib/python-zstandard/zstd/zstd.h
--- a/contrib/python-zstandard/zstd/zstd.h
+++ b/contrib/python-zstandard/zstd/zstd.h
@@ -71,16 +71,16 @@
 /*------   Version   ------*/
 #define ZSTD_VERSION_MAJOR    1
 #define ZSTD_VERSION_MINOR    3
-#define ZSTD_VERSION_RELEASE  6
+#define ZSTD_VERSION_RELEASE  8
 
 #define ZSTD_VERSION_NUMBER  (ZSTD_VERSION_MAJOR *100*100 + ZSTD_VERSION_MINOR *100 + ZSTD_VERSION_RELEASE)
-ZSTDLIB_API unsigned ZSTD_versionNumber(void);   /**< useful to check dll version */
+ZSTDLIB_API unsigned ZSTD_versionNumber(void);   /**< to check runtime library version */
 
 #define ZSTD_LIB_VERSION ZSTD_VERSION_MAJOR.ZSTD_VERSION_MINOR.ZSTD_VERSION_RELEASE
 #define ZSTD_QUOTE(str) #str
 #define ZSTD_EXPAND_AND_QUOTE(str) ZSTD_QUOTE(str)
 #define ZSTD_VERSION_STRING ZSTD_EXPAND_AND_QUOTE(ZSTD_LIB_VERSION)
-ZSTDLIB_API const char* ZSTD_versionString(void);   /* v1.3.0+ */
+ZSTDLIB_API const char* ZSTD_versionString(void);   /* requires v1.3.0+ */
 
 /***************************************
 *  Default constant
@@ -110,7 +110,7 @@
 ZSTDLIB_API size_t ZSTD_decompress( void* dst, size_t dstCapacity,
                               const void* src, size_t compressedSize);
 
-/*! ZSTD_getFrameContentSize() : added in v1.3.0
+/*! ZSTD_getFrameContentSize() : requires v1.3.0+
  *  `src` should point to the start of a ZSTD encoded frame.
  *  `srcSize` must be at least as large as the frame header.
  *            hint : any size >= `ZSTD_frameHeaderSize_max` is large enough.
@@ -167,8 +167,10 @@
 ZSTDLIB_API size_t     ZSTD_freeCCtx(ZSTD_CCtx* cctx);
 
 /*! ZSTD_compressCCtx() :
- *  Same as ZSTD_compress(), requires an allocated ZSTD_CCtx (see ZSTD_createCCtx()). */
-ZSTDLIB_API size_t ZSTD_compressCCtx(ZSTD_CCtx* ctx,
+ *  Same as ZSTD_compress(), using an explicit ZSTD_CCtx
+ *  The function will compress at requested compression level,
+ *  ignoring any other parameter */
+ZSTDLIB_API size_t ZSTD_compressCCtx(ZSTD_CCtx* cctx,
                                      void* dst, size_t dstCapacity,
                                const void* src, size_t srcSize,
                                      int compressionLevel);
@@ -184,8 +186,11 @@
 ZSTDLIB_API size_t     ZSTD_freeDCtx(ZSTD_DCtx* dctx);
 
 /*! ZSTD_decompressDCtx() :
- *  Same as ZSTD_decompress(), requires an allocated ZSTD_DCtx (see ZSTD_createDCtx()) */
-ZSTDLIB_API size_t ZSTD_decompressDCtx(ZSTD_DCtx* ctx,
+ *  Same as ZSTD_decompress(),
+ *  requires an allocated ZSTD_DCtx.
+ *  Compatible with sticky parameters.
+ */
+ZSTDLIB_API size_t ZSTD_decompressDCtx(ZSTD_DCtx* dctx,
                                        void* dst, size_t dstCapacity,
                                  const void* src, size_t srcSize);
 
@@ -194,9 +199,12 @@
 *  Simple dictionary API
 ***************************/
 /*! ZSTD_compress_usingDict() :
- *  Compression using a predefined Dictionary (see dictBuilder/zdict.h).
+ *  Compression at an explicit compression level using a Dictionary.
+ *  A dictionary can be any arbitrary data segment (also called a prefix),
+ *  or a buffer with specified information (see dictBuilder/zdict.h).
  *  Note : This function loads the dictionary, resulting in significant startup delay.
- *  Note : When `dict == NULL || dictSize < 8` no dictionary is used. */
+ *         It's intended for a dictionary used only once.
+ *  Note 2 : When `dict == NULL || dictSize < 8` no dictionary is used. */
 ZSTDLIB_API size_t ZSTD_compress_usingDict(ZSTD_CCtx* ctx,
                                            void* dst, size_t dstCapacity,
                                      const void* src, size_t srcSize,
@@ -204,9 +212,10 @@
                                            int compressionLevel);
 
 /*! ZSTD_decompress_usingDict() :
- *  Decompression using a predefined Dictionary (see dictBuilder/zdict.h).
+ *  Decompression using a known Dictionary.
  *  Dictionary must be identical to the one used during compression.
  *  Note : This function loads the dictionary, resulting in significant startup delay.
+ *         It's intended for a dictionary used only once.
  *  Note : When `dict == NULL || dictSize < 8` no dictionary is used. */
 ZSTDLIB_API size_t ZSTD_decompress_usingDict(ZSTD_DCtx* dctx,
                                              void* dst, size_t dstCapacity,
@@ -214,17 +223,18 @@
                                        const void* dict,size_t dictSize);
 
 
-/**********************************
+/***********************************
  *  Bulk processing dictionary API
- *********************************/
+ **********************************/
 typedef struct ZSTD_CDict_s ZSTD_CDict;
 
 /*! ZSTD_createCDict() :
- *  When compressing multiple messages / blocks with the same dictionary, it's recommended to load it just once.
- *  ZSTD_createCDict() will create a digested dictionary, ready to start future compression operations without startup delay.
+ *  When compressing multiple messages / blocks using the same dictionary, it's recommended to load it only once.
+ *  ZSTD_createCDict() will create a digested dictionary, ready to start future compression operations without startup cost.
  *  ZSTD_CDict can be created once and shared by multiple threads concurrently, since its usage is read-only.
- *  `dictBuffer` can be released after ZSTD_CDict creation, since its content is copied within CDict
- *  Note : A ZSTD_CDict can be created with an empty dictionary, but it is inefficient for small data. */
+ * `dictBuffer` can be released after ZSTD_CDict creation, because its content is copied within CDict.
+ *  Consider experimental function `ZSTD_createCDict_byReference()` if you prefer to not duplicate `dictBuffer` content.
+ *  Note : A ZSTD_CDict can be created from an empty dictBuffer, but it is inefficient when used to compress small data. */
 ZSTDLIB_API ZSTD_CDict* ZSTD_createCDict(const void* dictBuffer, size_t dictSize,
                                          int compressionLevel);
 
@@ -234,11 +244,9 @@
 
 /*! ZSTD_compress_usingCDict() :
  *  Compression using a digested Dictionary.
- *  Faster startup than ZSTD_compress_usingDict(), recommended when same dictionary is used multiple times.
- *  Note that compression level is decided during dictionary creation.
- *  Frame parameters are hardcoded (dictID=yes, contentSize=yes, checksum=no)
- *  Note : ZSTD_compress_usingCDict() can be used with a ZSTD_CDict created from an empty dictionary.
- *         But it is inefficient for small data, and it is recommended to use ZSTD_compressCCtx(). */
+ *  Recommended when same dictionary is used multiple times.
+ *  Note : compression level is _decided at dictionary creation time_,
+ *     and frame parameters are hardcoded (dictID=yes, contentSize=yes, checksum=no) */
 ZSTDLIB_API size_t ZSTD_compress_usingCDict(ZSTD_CCtx* cctx,
                                             void* dst, size_t dstCapacity,
                                       const void* src, size_t srcSize,
@@ -249,7 +257,7 @@
 
 /*! ZSTD_createDDict() :
  *  Create a digested dictionary, ready to start decompression operation without startup delay.
- *  dictBuffer can be released after DDict creation, as its content is copied inside DDict */
+ *  dictBuffer can be released after DDict creation, as its content is copied inside DDict. */
 ZSTDLIB_API ZSTD_DDict* ZSTD_createDDict(const void* dictBuffer, size_t dictSize);
 
 /*! ZSTD_freeDDict() :
@@ -258,7 +266,7 @@
 
 /*! ZSTD_decompress_usingDDict() :
  *  Decompression using a digested Dictionary.
- *  Faster startup than ZSTD_decompress_usingDict(), recommended when same dictionary is used multiple times. */
+ *  Recommended when same dictionary is used multiple times. */
 ZSTDLIB_API size_t ZSTD_decompress_usingDDict(ZSTD_DCtx* dctx,
                                               void* dst, size_t dstCapacity,
                                         const void* src, size_t srcSize,
@@ -289,13 +297,17 @@
 *  A ZSTD_CStream object is required to track streaming operation.
 *  Use ZSTD_createCStream() and ZSTD_freeCStream() to create/release resources.
 *  ZSTD_CStream objects can be reused multiple times on consecutive compression operations.
-*  It is recommended to re-use ZSTD_CStream in situations where many streaming operations will be achieved consecutively,
-*  since it will play nicer with system's memory, by re-using already allocated memory.
-*  Use one separate ZSTD_CStream per thread for parallel execution.
+*  It is recommended to re-use ZSTD_CStream since it will play nicer with system's memory, by re-using already allocated memory.
+*
+*  For parallel execution, use one separate ZSTD_CStream per thread.
+*
+*  note : since v1.3.0, ZSTD_CStream and ZSTD_CCtx are the same thing.
 *
-*  Start a new compression by initializing ZSTD_CStream context.
-*  Use ZSTD_initCStream() to start a new compression operation.
-*  Use variants ZSTD_initCStream_usingDict() or ZSTD_initCStream_usingCDict() for streaming with dictionary (experimental section)
+*  Parameters are sticky : when starting a new compression on the same context,
+*  it will re-use the same sticky parameters as previous compression session.
+*  When in doubt, it's recommended to fully initialize the context before usage.
+*  Use ZSTD_initCStream() to set the parameter to a selected compression level.
+*  Use advanced API (ZSTD_CCtx_setParameter(), etc.) to set more specific parameters.
 *
 *  Use ZSTD_compressStream() as many times as necessary to consume input stream.
 *  The function will automatically update both `pos` fields within `input` and `output`.
@@ -304,12 +316,11 @@
 *  in which case `input.pos < input.size`.
 *  The caller must check if input has been entirely consumed.
 *  If not, the caller must make some room to receive more compressed data,
-*  typically by emptying output buffer, or allocating a new output buffer,
 *  and then present again remaining input data.
-*  @return : a size hint, preferred nb of bytes to use as input for next function call
-*            or an error code, which can be tested using ZSTD_isError().
-*            Note 1 : it's just a hint, to help latency a little, any other value will work fine.
-*            Note 2 : size hint is guaranteed to be <= ZSTD_CStreamInSize()
+* @return : a size hint, preferred nb of bytes to use as input for next function call
+*           or an error code, which can be tested using ZSTD_isError().
+*           Note 1 : it's just a hint, to help latency a little, any value will work fine.
+*           Note 2 : size hint is guaranteed to be <= ZSTD_CStreamInSize()
 *
 *  At any moment, it's possible to flush whatever data might remain stuck within internal buffer,
 *  using ZSTD_flushStream(). `output->pos` will be updated.
@@ -353,23 +364,28 @@
 *  Use ZSTD_createDStream() and ZSTD_freeDStream() to create/release resources.
 *  ZSTD_DStream objects can be re-used multiple times.
 *
-*  Use ZSTD_initDStream() to start a new decompression operation,
-*   or ZSTD_initDStream_usingDict() if decompression requires a dictionary.
-*   @return : recommended first input size
+*  Use ZSTD_initDStream() to start a new decompression operation.
+* @return : recommended first input size
+*  Alternatively, use advanced API to set specific properties.
 *
 *  Use ZSTD_decompressStream() repetitively to consume your input.
 *  The function will update both `pos` fields.
 *  If `input.pos < input.size`, some input has not been consumed.
 *  It's up to the caller to present again remaining data.
+*  The function tries to flush all data decoded immediately, respecting output buffer size.
 *  If `output.pos < output.size`, decoder has flushed everything it could.
-*  @return : 0 when a frame is completely decoded and fully flushed,
-*            an error code, which can be tested using ZSTD_isError(),
-*            any other value > 0, which means there is still some decoding to do to complete current frame.
-*            The return value is a suggested next input size (a hint to improve latency) that will never load more than the current frame.
+*  But if `output.pos == output.size`, there might be some data left within internal buffers.,
+*  In which case, call ZSTD_decompressStream() again to flush whatever remains in the buffer.
+*  Note : with no additional input provided, amount of data flushed is necessarily <= ZSTD_BLOCKSIZE_MAX.
+* @return : 0 when a frame is completely decoded and fully flushed,
+*        or an error code, which can be tested using ZSTD_isError(),
+*        or any other value > 0, which means there is still some decoding or flushing to do to complete current frame :
+*                                the return value is a suggested next input size (just a hint for better latency)
+*                                that will never request more than the remaining frame size.
 * *******************************************************************************/
 
 typedef ZSTD_DCtx ZSTD_DStream;  /**< DCtx and DStream are now effectively same object (>= v1.3.0) */
-                                 /* For compatibility with versions <= v1.2.0, continue to consider them separated. */
+                                 /* For compatibility with versions <= v1.2.0, prefer differentiating them. */
 /*===== ZSTD_DStream management functions =====*/
 ZSTDLIB_API ZSTD_DStream* ZSTD_createDStream(void);
 ZSTDLIB_API size_t ZSTD_freeDStream(ZSTD_DStream* zds);
@@ -386,77 +402,602 @@
 
 
 
-#if defined(ZSTD_STATIC_LINKING_ONLY) && !defined(ZSTD_H_ZSTD_STATIC_LINKING_ONLY)
-#define ZSTD_H_ZSTD_STATIC_LINKING_ONLY
-
 /****************************************************************************************
  *   ADVANCED AND EXPERIMENTAL FUNCTIONS
  ****************************************************************************************
- * The definitions in this section are considered experimental.
+ * The definitions in the following section are considered experimental.
+ * They are provided for advanced scenarios.
  * They should never be used with a dynamic library, as prototypes may change in the future.
- * They are provided for advanced scenarios.
  * Use them only in association with static linking.
  * ***************************************************************************************/
 
+#if defined(ZSTD_STATIC_LINKING_ONLY) && !defined(ZSTD_H_ZSTD_STATIC_LINKING_ONLY)
+#define ZSTD_H_ZSTD_STATIC_LINKING_ONLY
+
+
+/****************************************************************************************
+ *   Candidate API for promotion to stable status
+ ****************************************************************************************
+ * The following symbols and constants form the "staging area" :
+ * they are considered to join "stable API" by v1.4.0.
+ * The proposal is written so that it can be made stable "as is",
+ * though it's still possible to suggest improvements.
+ * Staging is in fact last chance for changes,
+ * the API is locked once reaching "stable" status.
+ * ***************************************************************************************/
+
+
+/* ===  Constants   === */
+
+/* all magic numbers are supposed read/written to/from files/memory using little-endian convention */
+#define ZSTD_MAGICNUMBER            0xFD2FB528    /* valid since v0.8.0 */
+#define ZSTD_MAGIC_DICTIONARY       0xEC30A437    /* valid since v0.7.0 */
+#define ZSTD_MAGIC_SKIPPABLE_START  0x184D2A50    /* all 16 values, from 0x184D2A50 to 0x184D2A5F, signal the beginning of a skippable frame */
+#define ZSTD_MAGIC_SKIPPABLE_MASK   0xFFFFFFF0
+
+#define ZSTD_BLOCKSIZELOG_MAX  17
+#define ZSTD_BLOCKSIZE_MAX     (1<<ZSTD_BLOCKSIZELOG_MAX)
+
+
+/* ===   query limits   === */
+
 ZSTDLIB_API int ZSTD_minCLevel(void);  /*!< minimum negative compression level allowed */
 
-/* ---  Constants  ---*/
-#define ZSTD_MAGICNUMBER            0xFD2FB528   /* v0.8+ */
-#define ZSTD_MAGIC_DICTIONARY       0xEC30A437   /* v0.7+ */
-#define ZSTD_MAGIC_SKIPPABLE_START  0x184D2A50U
+
+/* ===   frame size   === */
+
+/*! ZSTD_findFrameCompressedSize() :
+ * `src` should point to the start of a ZSTD frame or skippable frame.
+ * `srcSize` must be >= first frame size
+ * @return : the compressed size of the first frame starting at `src`,
+ *           suitable to pass as `srcSize` to `ZSTD_decompress` or similar,
+ *        or an error code if input is invalid */
+ZSTDLIB_API size_t ZSTD_findFrameCompressedSize(const void* src, size_t srcSize);
+
+
+/* ===   Memory management   === */
+
+/*! ZSTD_sizeof_*() :
+ *  These functions give the _current_ memory usage of selected object.
+ *  Note that object memory usage can evolve (increase or decrease) over time. */
+ZSTDLIB_API size_t ZSTD_sizeof_CCtx(const ZSTD_CCtx* cctx);
+ZSTDLIB_API size_t ZSTD_sizeof_DCtx(const ZSTD_DCtx* dctx);
+ZSTDLIB_API size_t ZSTD_sizeof_CStream(const ZSTD_CStream* zcs);
+ZSTDLIB_API size_t ZSTD_sizeof_DStream(const ZSTD_DStream* zds);
+ZSTDLIB_API size_t ZSTD_sizeof_CDict(const ZSTD_CDict* cdict);
+ZSTDLIB_API size_t ZSTD_sizeof_DDict(const ZSTD_DDict* ddict);
+
+
+/***************************************
+*  Advanced compression API
+***************************************/
+
+/* API design :
+ *   Parameters are pushed one by one into an existing context,
+ *   using ZSTD_CCtx_set*() functions.
+ *   Pushed parameters are sticky : they are valid for next compressed frame, and any subsequent frame.
+ *   "sticky" parameters are applicable to `ZSTD_compress2()` and `ZSTD_compressStream*()` !
+ *   They do not apply to "simple" one-shot variants such as ZSTD_compressCCtx()
+ *
+ *   It's possible to reset all parameters to "default" using ZSTD_CCtx_reset().
+ *
+ *   This API supercedes all other "advanced" API entry points in the experimental section.
+ *   In the future, we expect to remove from experimental API entry points which are redundant with this API.
+ */
+
+
+/* Compression strategies, listed from fastest to strongest */
+typedef enum { ZSTD_fast=1,
+               ZSTD_dfast=2,
+               ZSTD_greedy=3,
+               ZSTD_lazy=4,
+               ZSTD_lazy2=5,
+               ZSTD_btlazy2=6,
+               ZSTD_btopt=7,
+               ZSTD_btultra=8,
+               ZSTD_btultra2=9
+               /* note : new strategies _might_ be added in the future.
+                         Only the order (from fast to strong) is guaranteed */
+} ZSTD_strategy;
+
+
+typedef enum {
 
-#define ZSTD_BLOCKSIZELOG_MAX 17
-#define ZSTD_BLOCKSIZE_MAX   (1<<ZSTD_BLOCKSIZELOG_MAX)   /* define, for static allocation */
+    /* compression parameters */
+    ZSTD_c_compressionLevel=100, /* Update all compression parameters according to pre-defined cLevel table
+                              * Default level is ZSTD_CLEVEL_DEFAULT==3.
+                              * Special: value 0 means default, which is controlled by ZSTD_CLEVEL_DEFAULT.
+                              * Note 1 : it's possible to pass a negative compression level.
+                              * Note 2 : setting a level sets all default values of other compression parameters */
+    ZSTD_c_windowLog=101,    /* Maximum allowed back-reference distance, expressed as power of 2.
+                              * Must be clamped between ZSTD_WINDOWLOG_MIN and ZSTD_WINDOWLOG_MAX.
+                              * Special: value 0 means "use default windowLog".
+                              * Note: Using a windowLog greater than ZSTD_WINDOWLOG_LIMIT_DEFAULT
+                              *       requires explicitly allowing such window size at decompression stage if using streaming. */
+    ZSTD_c_hashLog=102,      /* Size of the initial probe table, as a power of 2.
+                              * Resulting memory usage is (1 << (hashLog+2)).
+                              * Must be clamped between ZSTD_HASHLOG_MIN and ZSTD_HASHLOG_MAX.
+                              * Larger tables improve compression ratio of strategies <= dFast,
+                              * and improve speed of strategies > dFast.
+                              * Special: value 0 means "use default hashLog". */
+    ZSTD_c_chainLog=103,     /* Size of the multi-probe search table, as a power of 2.
+                              * Resulting memory usage is (1 << (chainLog+2)).
+                              * Must be clamped between ZSTD_CHAINLOG_MIN and ZSTD_CHAINLOG_MAX.
+                              * Larger tables result in better and slower compression.
+                              * This parameter is useless when using "fast" strategy.
+                              * It's still useful when using "dfast" strategy,
+                              * in which case it defines a secondary probe table.
+                              * Special: value 0 means "use default chainLog". */
+    ZSTD_c_searchLog=104,    /* Number of search attempts, as a power of 2.
+                              * More attempts result in better and slower compression.
+                              * This parameter is useless when using "fast" and "dFast" strategies.
+                              * Special: value 0 means "use default searchLog". */
+    ZSTD_c_minMatch=105,     /* Minimum size of searched matches.
+                              * Note that Zstandard can still find matches of smaller size,
+                              * it just tweaks its search algorithm to look for this size and larger.
+                              * Larger values increase compression and decompression speed, but decrease ratio.
+                              * Must be clamped between ZSTD_MINMATCH_MIN and ZSTD_MINMATCH_MAX.
+                              * Note that currently, for all strategies < btopt, effective minimum is 4.
+                              *                    , for all strategies > fast, effective maximum is 6.
+                              * Special: value 0 means "use default minMatchLength". */
+    ZSTD_c_targetLength=106, /* Impact of this field depends on strategy.
+                              * For strategies btopt, btultra & btultra2:
+                              *     Length of Match considered "good enough" to stop search.
+                              *     Larger values make compression stronger, and slower.
+                              * For strategy fast:
+                              *     Distance between match sampling.
+                              *     Larger values make compression faster, and weaker.
+                              * Special: value 0 means "use default targetLength". */
+    ZSTD_c_strategy=107,     /* See ZSTD_strategy enum definition.
+                              * The higher the value of selected strategy, the more complex it is,
+                              * resulting in stronger and slower compression.
+                              * Special: value 0 means "use default strategy". */
+
+    /* LDM mode parameters */
+    ZSTD_c_enableLongDistanceMatching=160, /* Enable long distance matching.
+                                     * This parameter is designed to improve compression ratio
+                                     * for large inputs, by finding large matches at long distance.
+                                     * It increases memory usage and window size.
+                                     * Note: enabling this parameter increases default ZSTD_c_windowLog to 128 MB
+                                     * except when expressly set to a different value. */
+    ZSTD_c_ldmHashLog=161,   /* Size of the table for long distance matching, as a power of 2.
+                              * Larger values increase memory usage and compression ratio,
+                              * but decrease compression speed.
+                              * Must be clamped between ZSTD_HASHLOG_MIN and ZSTD_HASHLOG_MAX
+                              * default: windowlog - 7.
+                              * Special: value 0 means "automatically determine hashlog". */
+    ZSTD_c_ldmMinMatch=162,  /* Minimum match size for long distance matcher.
+                              * Larger/too small values usually decrease compression ratio.
+                              * Must be clamped between ZSTD_LDM_MINMATCH_MIN and ZSTD_LDM_MINMATCH_MAX.
+                              * Special: value 0 means "use default value" (default: 64). */
+    ZSTD_c_ldmBucketSizeLog=163, /* Log size of each bucket in the LDM hash table for collision resolution.
+                              * Larger values improve collision resolution but decrease compression speed.
+                              * The maximum value is ZSTD_LDM_BUCKETSIZELOG_MAX.
+                              * Special: value 0 means "use default value" (default: 3). */
+    ZSTD_c_ldmHashRateLog=164, /* Frequency of inserting/looking up entries into the LDM hash table.
+                              * Must be clamped between 0 and (ZSTD_WINDOWLOG_MAX - ZSTD_HASHLOG_MIN).
+                              * Default is MAX(0, (windowLog - ldmHashLog)), optimizing hash table usage.
+                              * Larger values improve compression speed.
+                              * Deviating far from default value will likely result in a compression ratio decrease.
+                              * Special: value 0 means "automatically determine hashRateLog". */
+
+    /* frame parameters */
+    ZSTD_c_contentSizeFlag=200, /* Content size will be written into frame header _whenever known_ (default:1)
+                              * Content size must be known at the beginning of compression.
+                              * This is automatically the case when using ZSTD_compress2(),
+                              * For streaming variants, content size must be provided with ZSTD_CCtx_setPledgedSrcSize() */
+    ZSTD_c_checksumFlag=201, /* A 32-bits checksum of content is written at end of frame (default:0) */
+    ZSTD_c_dictIDFlag=202,   /* When applicable, dictionary's ID is written into frame header (default:1) */
 
-#define ZSTD_WINDOWLOG_MAX_32   30
-#define ZSTD_WINDOWLOG_MAX_64   31
-#define ZSTD_WINDOWLOG_MAX    ((unsigned)(sizeof(size_t) == 4 ? ZSTD_WINDOWLOG_MAX_32 : ZSTD_WINDOWLOG_MAX_64))
-#define ZSTD_WINDOWLOG_MIN      10
-#define ZSTD_HASHLOG_MAX      ((ZSTD_WINDOWLOG_MAX < 30) ? ZSTD_WINDOWLOG_MAX : 30)
-#define ZSTD_HASHLOG_MIN         6
-#define ZSTD_CHAINLOG_MAX_32    29
-#define ZSTD_CHAINLOG_MAX_64    30
-#define ZSTD_CHAINLOG_MAX     ((unsigned)(sizeof(size_t) == 4 ? ZSTD_CHAINLOG_MAX_32 : ZSTD_CHAINLOG_MAX_64))
-#define ZSTD_CHAINLOG_MIN       ZSTD_HASHLOG_MIN
-#define ZSTD_HASHLOG3_MAX       17
-#define ZSTD_SEARCHLOG_MAX     (ZSTD_WINDOWLOG_MAX-1)
-#define ZSTD_SEARCHLOG_MIN       1
-#define ZSTD_SEARCHLENGTH_MAX    7   /* only for ZSTD_fast, other strategies are limited to 6 */
-#define ZSTD_SEARCHLENGTH_MIN    3   /* only for ZSTD_btopt, other strategies are limited to 4 */
-#define ZSTD_TARGETLENGTH_MAX  ZSTD_BLOCKSIZE_MAX
-#define ZSTD_TARGETLENGTH_MIN    0   /* note : comparing this constant to an unsigned results in a tautological test */
-#define ZSTD_LDM_MINMATCH_MAX 4096
-#define ZSTD_LDM_MINMATCH_MIN    4
-#define ZSTD_LDM_BUCKETSIZELOG_MAX 8
+    /* multi-threading parameters */
+    /* These parameters are only useful if multi-threading is enabled (compiled with build macro ZSTD_MULTITHREAD).
+     * They return an error otherwise. */
+    ZSTD_c_nbWorkers=400,    /* Select how many threads will be spawned to compress in parallel.
+                              * When nbWorkers >= 1, triggers asynchronous mode when used with ZSTD_compressStream*() :
+                              * ZSTD_compressStream*() consumes input and flush output if possible, but immediately gives back control to caller,
+                              * while compression work is performed in parallel, within worker threads.
+                              * (note : a strong exception to this rule is when first invocation of ZSTD_compressStream2() sets ZSTD_e_end :
+                              *  in which case, ZSTD_compressStream2() delegates to ZSTD_compress2(), which is always a blocking call).
+                              * More workers improve speed, but also increase memory usage.
+                              * Default value is `0`, aka "single-threaded mode" : no worker is spawned, compression is performed inside Caller's thread, all invocations are blocking */
+    ZSTD_c_jobSize=401,      /* Size of a compression job. This value is enforced only when nbWorkers >= 1.
+                              * Each compression job is completed in parallel, so this value can indirectly impact the nb of active threads.
+                              * 0 means default, which is dynamically determined based on compression parameters.
+                              * Job size must be a minimum of overlap size, or 1 MB, whichever is largest.
+                              * The minimum size is automatically and transparently enforced */
+    ZSTD_c_overlapLog=402,   /* Control the overlap size, as a fraction of window size.
+                              * The overlap size is an amount of data reloaded from previous job at the beginning of a new job.
+                              * It helps preserve compression ratio, while each job is compressed in parallel.
+                              * This value is enforced only when nbWorkers >= 1.
+                              * Larger values increase compression ratio, but decrease speed.
+                              * Possible values range from 0 to 9 :
+                              * - 0 means "default" : value will be determined by the library, depending on strategy
+                              * - 1 means "no overlap"
+                              * - 9 means "full overlap", using a full window size.
+                              * Each intermediate rank increases/decreases load size by a factor 2 :
+                              * 9: full window;  8: w/2;  7: w/4;  6: w/8;  5:w/16;  4: w/32;  3:w/64;  2:w/128;  1:no overlap;  0:default
+                              * default value varies between 6 and 9, depending on strategy */
+
+    /* note : additional experimental parameters are also available
+     * within the experimental section of the API.
+     * At the time of this writing, they include :
+     * ZSTD_c_rsyncable
+     * ZSTD_c_format
+     * ZSTD_c_forceMaxWindow
+     * ZSTD_c_forceAttachDict
+     * Because they are not stable, it's necessary to define ZSTD_STATIC_LINKING_ONLY to access them.
+     * note : never ever use experimentalParam? names directly;
+     *        also, the enums values themselves are unstable and can still change.
+     */
+     ZSTD_c_experimentalParam1=500,
+     ZSTD_c_experimentalParam2=10,
+     ZSTD_c_experimentalParam3=1000,
+     ZSTD_c_experimentalParam4=1001
+} ZSTD_cParameter;
+
+
+typedef struct {
+    size_t error;
+    int lowerBound;
+    int upperBound;
+} ZSTD_bounds;
+
+/*! ZSTD_cParam_getBounds() :
+ *  All parameters must belong to an interval with lower and upper bounds,
+ *  otherwise they will either trigger an error or be automatically clamped.
+ * @return : a structure, ZSTD_bounds, which contains
+ *         - an error status field, which must be tested using ZSTD_isError()
+ *         - lower and upper bounds, both inclusive
+ */
+ZSTDLIB_API ZSTD_bounds ZSTD_cParam_getBounds(ZSTD_cParameter cParam);
+
+/*! ZSTD_CCtx_setParameter() :
+ *  Set one compression parameter, selected by enum ZSTD_cParameter.
+ *  All parameters have valid bounds. Bounds can be queried using ZSTD_cParam_getBounds().
+ *  Providing a value beyond bound will either clamp it, or trigger an error (depending on parameter).
+ *  Setting a parameter is generally only possible during frame initialization (before starting compression).
+ *  Exception : when using multi-threading mode (nbWorkers >= 1),
+ *              the following parameters can be updated _during_ compression (within same frame):
+ *              => compressionLevel, hashLog, chainLog, searchLog, minMatch, targetLength and strategy.
+ *              new parameters will be active for next job only (after a flush()).
+ * @return : an error code (which can be tested using ZSTD_isError()).
+ */
+ZSTDLIB_API size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, int value);
 
-#define ZSTD_FRAMEHEADERSIZE_PREFIX 5   /* minimum input size to know frame header size */
-#define ZSTD_FRAMEHEADERSIZE_MIN    6
-#define ZSTD_FRAMEHEADERSIZE_MAX   18   /* for static allocation */
-static const size_t ZSTD_frameHeaderSize_prefix = ZSTD_FRAMEHEADERSIZE_PREFIX;
-static const size_t ZSTD_frameHeaderSize_min = ZSTD_FRAMEHEADERSIZE_MIN;
-static const size_t ZSTD_frameHeaderSize_max = ZSTD_FRAMEHEADERSIZE_MAX;
-static const size_t ZSTD_skippableHeaderSize = 8;  /* magic number + skippable frame length */
+/*! ZSTD_CCtx_setPledgedSrcSize() :
+ *  Total input data size to be compressed as a single frame.
+ *  Value will be written in frame header, unless if explicitly forbidden using ZSTD_c_contentSizeFlag.
+ *  This value will also be controlled at end of frame, and trigger an error if not respected.
+ * @result : 0, or an error code (which can be tested with ZSTD_isError()).
+ *  Note 1 : pledgedSrcSize==0 actually means zero, aka an empty frame.
+ *           In order to mean "unknown content size", pass constant ZSTD_CONTENTSIZE_UNKNOWN.
+ *           ZSTD_CONTENTSIZE_UNKNOWN is default value for any new frame.
+ *  Note 2 : pledgedSrcSize is only valid once, for the next frame.
+ *           It's discarded at the end of the frame, and replaced by ZSTD_CONTENTSIZE_UNKNOWN.
+ *  Note 3 : Whenever all input data is provided and consumed in a single round,
+ *           for example with ZSTD_compress2(),
+ *           or invoking immediately ZSTD_compressStream2(,,,ZSTD_e_end),
+ *           this value is automatically overriden by srcSize instead.
+ */
+ZSTDLIB_API size_t ZSTD_CCtx_setPledgedSrcSize(ZSTD_CCtx* cctx, unsigned long long pledgedSrcSize);
+
+/*! ZSTD_CCtx_loadDictionary() :
+ *  Create an internal CDict from `dict` buffer.
+ *  Decompression will have to use same dictionary.
+ * @result : 0, or an error code (which can be tested with ZSTD_isError()).
+ *  Special: Loading a NULL (or 0-size) dictionary invalidates previous dictionary,
+ *           meaning "return to no-dictionary mode".
+ *  Note 1 : Dictionary is sticky, it will be used for all future compressed frames.
+ *           To return to "no-dictionary" situation, load a NULL dictionary (or reset parameters).
+ *  Note 2 : Loading a dictionary involves building tables.
+ *           It's also a CPU consuming operation, with non-negligible impact on latency.
+ *           Tables are dependent on compression parameters, and for this reason,
+ *           compression parameters can no longer be changed after loading a dictionary.
+ *  Note 3 :`dict` content will be copied internally.
+ *           Use experimental ZSTD_CCtx_loadDictionary_byReference() to reference content instead.
+ *           In such a case, dictionary buffer must outlive its users.
+ *  Note 4 : Use ZSTD_CCtx_loadDictionary_advanced()
+ *           to precisely select how dictionary content must be interpreted. */
+ZSTDLIB_API size_t ZSTD_CCtx_loadDictionary(ZSTD_CCtx* cctx, const void* dict, size_t dictSize);
+
+/*! ZSTD_CCtx_refCDict() :
+ *  Reference a prepared dictionary, to be used for all next compressed frames.
+ *  Note that compression parameters are enforced from within CDict,
+ *  and supercede any compression parameter previously set within CCtx.
+ *  The dictionary will remain valid for future compressed frames using same CCtx.
+ * @result : 0, or an error code (which can be tested with ZSTD_isError()).
+ *  Special : Referencing a NULL CDict means "return to no-dictionary mode".
+ *  Note 1 : Currently, only one dictionary can be managed.
+ *           Referencing a new dictionary effectively "discards" any previous one.
+ *  Note 2 : CDict is just referenced, its lifetime must outlive its usage within CCtx. */
+ZSTDLIB_API size_t ZSTD_CCtx_refCDict(ZSTD_CCtx* cctx, const ZSTD_CDict* cdict);
+
+/*! ZSTD_CCtx_refPrefix() :
+ *  Reference a prefix (single-usage dictionary) for next compressed frame.
+ *  A prefix is **only used once**. Tables are discarded at end of frame (ZSTD_e_end).
+ *  Decompression will need same prefix to properly regenerate data.
+ *  Compressing with a prefix is similar in outcome as performing a diff and compressing it,
+ *  but performs much faster, especially during decompression (compression speed is tunable with compression level).
+ * @result : 0, or an error code (which can be tested with ZSTD_isError()).
+ *  Special: Adding any prefix (including NULL) invalidates any previous prefix or dictionary
+ *  Note 1 : Prefix buffer is referenced. It **must** outlive compression.
+ *           Its content must remain unmodified during compression.
+ *  Note 2 : If the intention is to diff some large src data blob with some prior version of itself,
+ *           ensure that the window size is large enough to contain the entire source.
+ *           See ZSTD_c_windowLog.
+ *  Note 3 : Referencing a prefix involves building tables, which are dependent on compression parameters.
+ *           It's a CPU consuming operation, with non-negligible impact on latency.
+ *           If there is a need to use the same prefix multiple times, consider loadDictionary instead.
+ *  Note 4 : By default, the prefix is interpreted as raw content (ZSTD_dm_rawContent).
+ *           Use experimental ZSTD_CCtx_refPrefix_advanced() to alter dictionary interpretation. */
+ZSTDLIB_API size_t ZSTD_CCtx_refPrefix(ZSTD_CCtx* cctx,
+                                 const void* prefix, size_t prefixSize);
+
+
+typedef enum {
+    ZSTD_reset_session_only = 1,
+    ZSTD_reset_parameters = 2,
+    ZSTD_reset_session_and_parameters = 3
+} ZSTD_ResetDirective;
+
+/*! ZSTD_CCtx_reset() :
+ *  There are 2 different things that can be reset, independently or jointly :
+ *  - The session : will stop compressing current frame, and make CCtx ready to start a new one.
+ *                  Useful after an error, or to interrupt any ongoing compression.
+ *                  Any internal data not yet flushed is cancelled.
+ *                  Compression parameters and dictionary remain unchanged.
+ *                  They will be used to compress next frame.
+ *                  Resetting session never fails.
+ *  - The parameters : changes all parameters back to "default".
+ *                  This removes any reference to any dictionary too.
+ *                  Parameters can only be changed between 2 sessions (i.e. no compression is currently ongoing)
+ *                  otherwise the reset fails, and function returns an error value (which can be tested using ZSTD_isError())
+ *  - Both : similar to resetting the session, followed by resetting parameters.
+ */
+ZSTDLIB_API size_t ZSTD_CCtx_reset(ZSTD_CCtx* cctx, ZSTD_ResetDirective reset);
 
 
 
+/*! ZSTD_compress2() :
+ *  Behave the same as ZSTD_compressCCtx(), but compression parameters are set using the advanced API.
+ *  ZSTD_compress2() always starts a new frame.
+ *  Should cctx hold data from a previously unfinished frame, everything about it is forgotten.
+ *  - Compression parameters are pushed into CCtx before starting compression, using ZSTD_CCtx_set*()
+ *  - The function is always blocking, returns when compression is completed.
+ *  Hint : compression runs faster if `dstCapacity` >=  `ZSTD_compressBound(srcSize)`.
+ * @return : compressed size written into `dst` (<= `dstCapacity),
+ *           or an error code if it fails (which can be tested using ZSTD_isError()).
+ */
+ZSTDLIB_API size_t ZSTD_compress2( ZSTD_CCtx* cctx,
+                                   void* dst, size_t dstCapacity,
+                             const void* src, size_t srcSize);
+
+typedef enum {
+    ZSTD_e_continue=0, /* collect more data, encoder decides when to output compressed result, for optimal compression ratio */
+    ZSTD_e_flush=1,    /* flush any data provided so far,
+                        * it creates (at least) one new block, that can be decoded immediately on reception;
+                        * frame will continue: any future data can still reference previously compressed data, improving compression. */
+    ZSTD_e_end=2       /* flush any remaining data _and_ close current frame.
+                        * note that frame is only closed after compressed data is fully flushed (return value == 0).
+                        * After that point, any additional data starts a new frame.
+                        * note : each frame is independent (does not reference any content from previous frame). */
+} ZSTD_EndDirective;
+
+/*! ZSTD_compressStream2() :
+ *  Behaves about the same as ZSTD_compressStream, with additional control on end directive.
+ *  - Compression parameters are pushed into CCtx before starting compression, using ZSTD_CCtx_set*()
+ *  - Compression parameters cannot be changed once compression is started (save a list of exceptions in multi-threading mode)
+ *  - outpot->pos must be <= dstCapacity, input->pos must be <= srcSize
+ *  - outpot->pos and input->pos will be updated. They are guaranteed to remain below their respective limit.
+ *  - When nbWorkers==0 (default), function is blocking : it completes its job before returning to caller.
+ *  - When nbWorkers>=1, function is non-blocking : it just acquires a copy of input, and distributes jobs to internal worker threads, flush whatever is available,
+ *                                                  and then immediately returns, just indicating that there is some data remaining to be flushed.
+ *                                                  The function nonetheless guarantees forward progress : it will return only after it reads or write at least 1+ byte.
+ *  - Exception : if the first call requests a ZSTD_e_end directive and provides enough dstCapacity, the function delegates to ZSTD_compress2() which is always blocking.
+ *  - @return provides a minimum amount of data remaining to be flushed from internal buffers
+ *            or an error code, which can be tested using ZSTD_isError().
+ *            if @return != 0, flush is not fully completed, there is still some data left within internal buffers.
+ *            This is useful for ZSTD_e_flush, since in this case more flushes are necessary to empty all buffers.
+ *            For ZSTD_e_end, @return == 0 when internal buffers are fully flushed and frame is completed.
+ *  - after a ZSTD_e_end directive, if internal buffer is not fully flushed (@return != 0),
+ *            only ZSTD_e_end or ZSTD_e_flush operations are allowed.
+ *            Before starting a new compression job, or changing compression parameters,
+ *            it is required to fully flush internal buffers.
+ */
+ZSTDLIB_API size_t ZSTD_compressStream2( ZSTD_CCtx* cctx,
+                                         ZSTD_outBuffer* output,
+                                         ZSTD_inBuffer* input,
+                                         ZSTD_EndDirective endOp);
+
+
+
+/* ============================== */
+/*   Advanced decompression API   */
+/* ============================== */
+
+/* The advanced API pushes parameters one by one into an existing DCtx context.
+ * Parameters are sticky, and remain valid for all following frames
+ * using the same DCtx context.
+ * It's possible to reset parameters to default values using ZSTD_DCtx_reset().
+ * Note : This API is compatible with existing ZSTD_decompressDCtx() and ZSTD_decompressStream().
+ *        Therefore, no new decompression function is necessary.
+ */
+
+
+typedef enum {
+
+    ZSTD_d_windowLogMax=100, /* Select a size limit (in power of 2) beyond which
+                              * the streaming API will refuse to allocate memory buffer
+                              * in order to protect the host from unreasonable memory requirements.
+                              * This parameter is only useful in streaming mode, since no internal buffer is allocated in single-pass mode.
+                              * By default, a decompression context accepts window sizes <= (1 << ZSTD_WINDOWLOG_LIMIT_DEFAULT) */
+
+    /* note : additional experimental parameters are also available
+     * within the experimental section of the API.
+     * At the time of this writing, they include :
+     * ZSTD_c_format
+     * Because they are not stable, it's necessary to define ZSTD_STATIC_LINKING_ONLY to access them.
+     * note : never ever use experimentalParam? names directly
+     */
+     ZSTD_d_experimentalParam1=1000
+
+} ZSTD_dParameter;
+
+
+/*! ZSTD_dParam_getBounds() :
+ *  All parameters must belong to an interval with lower and upper bounds,
+ *  otherwise they will either trigger an error or be automatically clamped.
+ * @return : a structure, ZSTD_bounds, which contains
+ *         - an error status field, which must be tested using ZSTD_isError()
+ *         - both lower and upper bounds, inclusive
+ */
+ZSTDLIB_API ZSTD_bounds ZSTD_dParam_getBounds(ZSTD_dParameter dParam);
+
+/*! ZSTD_DCtx_setParameter() :
+ *  Set one compression parameter, selected by enum ZSTD_dParameter.
+ *  All parameters have valid bounds. Bounds can be queried using ZSTD_dParam_getBounds().
+ *  Providing a value beyond bound will either clamp it, or trigger an error (depending on parameter).
+ *  Setting a parameter is only possible during frame initialization (before starting decompression).
+ * @return : 0, or an error code (which can be tested using ZSTD_isError()).
+ */
+ZSTDLIB_API size_t ZSTD_DCtx_setParameter(ZSTD_DCtx* dctx, ZSTD_dParameter param, int value);
+
+
+/*! ZSTD_DCtx_loadDictionary() :
+ *  Create an internal DDict from dict buffer,
+ *  to be used to decompress next frames.
+ *  The dictionary remains valid for all future frames, until explicitly invalidated.
+ * @result : 0, or an error code (which can be tested with ZSTD_isError()).
+ *  Special : Adding a NULL (or 0-size) dictionary invalidates any previous dictionary,
+ *            meaning "return to no-dictionary mode".
+ *  Note 1 : Loading a dictionary involves building tables,
+ *           which has a non-negligible impact on CPU usage and latency.
+ *           It's recommended to "load once, use many times", to amortize the cost
+ *  Note 2 :`dict` content will be copied internally, so `dict` can be released after loading.
+ *           Use ZSTD_DCtx_loadDictionary_byReference() to reference dictionary content instead.
+ *  Note 3 : Use ZSTD_DCtx_loadDictionary_advanced() to take control of
+ *           how dictionary content is loaded and interpreted.
+ */
+ZSTDLIB_API size_t ZSTD_DCtx_loadDictionary(ZSTD_DCtx* dctx, const void* dict, size_t dictSize);
+
+/*! ZSTD_DCtx_refDDict() :
+ *  Reference a prepared dictionary, to be used to decompress next frames.
+ *  The dictionary remains active for decompression of future frames using same DCtx.
+ * @result : 0, or an error code (which can be tested with ZSTD_isError()).
+ *  Note 1 : Currently, only one dictionary can be managed.
+ *           Referencing a new dictionary effectively "discards" any previous one.
+ *  Special: referencing a NULL DDict means "return to no-dictionary mode".
+ *  Note 2 : DDict is just referenced, its lifetime must outlive its usage from DCtx.
+ */
+ZSTDLIB_API size_t ZSTD_DCtx_refDDict(ZSTD_DCtx* dctx, const ZSTD_DDict* ddict);
+
+/*! ZSTD_DCtx_refPrefix() :
+ *  Reference a prefix (single-usage dictionary) to decompress next frame.
+ *  This is the reverse operation of ZSTD_CCtx_refPrefix(),
+ *  and must use the same prefix as the one used during compression.
+ *  Prefix is **only used once**. Reference is discarded at end of frame.
+ *  End of frame is reached when ZSTD_decompressStream() returns 0.
+ * @result : 0, or an error code (which can be tested with ZSTD_isError()).
+ *  Note 1 : Adding any prefix (including NULL) invalidates any previously set prefix or dictionary
+ *  Note 2 : Prefix buffer is referenced. It **must** outlive decompression.
+ *           Prefix buffer must remain unmodified up to the end of frame,
+ *           reached when ZSTD_decompressStream() returns 0.
+ *  Note 3 : By default, the prefix is treated as raw content (ZSTD_dm_rawContent).
+ *           Use ZSTD_CCtx_refPrefix_advanced() to alter dictMode (Experimental section)
+ *  Note 4 : Referencing a raw content prefix has almost no cpu nor memory cost.
+ *           A full dictionary is more costly, as it requires building tables.
+ */
+ZSTDLIB_API size_t ZSTD_DCtx_refPrefix(ZSTD_DCtx* dctx,
+                                 const void* prefix, size_t prefixSize);
+
+/*! ZSTD_DCtx_reset() :
+ *  Return a DCtx to clean state.
+ *  Session and parameters can be reset jointly or separately.
+ *  Parameters can only be reset when no active frame is being decompressed.
+ * @return : 0, or an error code, which can be tested with ZSTD_isError()
+ */
+ZSTDLIB_API size_t ZSTD_DCtx_reset(ZSTD_DCtx* dctx, ZSTD_ResetDirective reset);
+
+
+
+/****************************************************************************************
+ *   experimental API (static linking only)
+ ****************************************************************************************
+ * The following symbols and constants
+ * are not planned to join "stable API" status in the near future.
+ * They can still change in future versions.
+ * Some of them are planned to remain in the static_only section indefinitely.
+ * Some of them might be removed in the future (especially when redundant with existing stable functions)
+ * ***************************************************************************************/
+
+#define ZSTD_FRAMEHEADERSIZE_PREFIX 5   /* minimum input size required to query frame header size */
+#define ZSTD_FRAMEHEADERSIZE_MIN    6
+#define ZSTD_FRAMEHEADERSIZE_MAX   18   /* can be useful for static allocation */
+#define ZSTD_SKIPPABLEHEADERSIZE    8
+
+/* compression parameter bounds */
+#define ZSTD_WINDOWLOG_MAX_32    30
+#define ZSTD_WINDOWLOG_MAX_64    31
+#define ZSTD_WINDOWLOG_MAX     ((int)(sizeof(size_t) == 4 ? ZSTD_WINDOWLOG_MAX_32 : ZSTD_WINDOWLOG_MAX_64))
+#define ZSTD_WINDOWLOG_MIN       10
+#define ZSTD_HASHLOG_MAX       ((ZSTD_WINDOWLOG_MAX < 30) ? ZSTD_WINDOWLOG_MAX : 30)
+#define ZSTD_HASHLOG_MIN          6
+#define ZSTD_CHAINLOG_MAX_32     29
+#define ZSTD_CHAINLOG_MAX_64     30
+#define ZSTD_CHAINLOG_MAX      ((int)(sizeof(size_t) == 4 ? ZSTD_CHAINLOG_MAX_32 : ZSTD_CHAINLOG_MAX_64))
+#define ZSTD_CHAINLOG_MIN        ZSTD_HASHLOG_MIN
+#define ZSTD_SEARCHLOG_MAX      (ZSTD_WINDOWLOG_MAX-1)
+#define ZSTD_SEARCHLOG_MIN        1
+#define ZSTD_MINMATCH_MAX         7   /* only for ZSTD_fast, other strategies are limited to 6 */
+#define ZSTD_MINMATCH_MIN         3   /* only for ZSTD_btopt+, faster strategies are limited to 4 */
+#define ZSTD_TARGETLENGTH_MAX    ZSTD_BLOCKSIZE_MAX
+#define ZSTD_TARGETLENGTH_MIN     0   /* note : comparing this constant to an unsigned results in a tautological test */
+#define ZSTD_STRATEGY_MIN        ZSTD_fast
+#define ZSTD_STRATEGY_MAX        ZSTD_btultra2
+
+
+#define ZSTD_OVERLAPLOG_MIN       0
+#define ZSTD_OVERLAPLOG_MAX       9
+
+#define ZSTD_WINDOWLOG_LIMIT_DEFAULT 27   /* by default, the streaming decoder will refuse any frame
+                                           * requiring larger than (1<<ZSTD_WINDOWLOG_LIMIT_DEFAULT) window size,
+                                           * to preserve host's memory from unreasonable requirements.
+                                           * This limit can be overriden using ZSTD_DCtx_setParameter(,ZSTD_d_windowLogMax,).
+                                           * The limit does not apply for one-pass decoders (such as ZSTD_decompress()), since no additional memory is allocated */
+
+
+/* LDM parameter bounds */
+#define ZSTD_LDM_HASHLOG_MIN      ZSTD_HASHLOG_MIN
+#define ZSTD_LDM_HASHLOG_MAX      ZSTD_HASHLOG_MAX
+#define ZSTD_LDM_MINMATCH_MIN        4
+#define ZSTD_LDM_MINMATCH_MAX     4096
+#define ZSTD_LDM_BUCKETSIZELOG_MIN   1
+#define ZSTD_LDM_BUCKETSIZELOG_MAX   8
+#define ZSTD_LDM_HASHRATELOG_MIN     0
+#define ZSTD_LDM_HASHRATELOG_MAX (ZSTD_WINDOWLOG_MAX - ZSTD_HASHLOG_MIN)
+
+/* internal */
+#define ZSTD_HASHLOG3_MAX           17
+
+
 /* ---  Advanced types  --- */
-typedef enum { ZSTD_fast=1, ZSTD_dfast, ZSTD_greedy, ZSTD_lazy, ZSTD_lazy2,
-               ZSTD_btlazy2, ZSTD_btopt, ZSTD_btultra } ZSTD_strategy;   /* from faster to stronger */
+
+typedef struct ZSTD_CCtx_params_s ZSTD_CCtx_params;
 
 typedef struct {
-    unsigned windowLog;      /**< largest match distance : larger == more compression, more memory needed during decompression */
-    unsigned chainLog;       /**< fully searched segment : larger == more compression, slower, more memory (useless for fast) */
-    unsigned hashLog;        /**< dispatch table : larger == faster, more memory */
-    unsigned searchLog;      /**< nb of searches : larger == more compression, slower */
-    unsigned searchLength;   /**< match length searched : larger == faster decompression, sometimes less compression */
-    unsigned targetLength;   /**< acceptable match size for optimal parser (only) : larger == more compression, slower */
-    ZSTD_strategy strategy;
+    unsigned windowLog;       /**< largest match distance : larger == more compression, more memory needed during decompression */
+    unsigned chainLog;        /**< fully searched segment : larger == more compression, slower, more memory (useless for fast) */
+    unsigned hashLog;         /**< dispatch table : larger == faster, more memory */
+    unsigned searchLog;       /**< nb of searches : larger == more compression, slower */
+    unsigned minMatch;        /**< match length searched : larger == faster decompression, sometimes less compression */
+    unsigned targetLength;    /**< acceptable match size for optimal parser (only) : larger == more compression, slower */
+    ZSTD_strategy strategy;   /**< see ZSTD_strategy definition above */
 } ZSTD_compressionParameters;
 
 typedef struct {
-    unsigned contentSizeFlag; /**< 1: content size will be in frame header (when known) */
-    unsigned checksumFlag;    /**< 1: generate a 32-bits checksum at end of frame, for error detection */
-    unsigned noDictIDFlag;    /**< 1: no dictID will be saved into frame header (if dictionary compression) */
+    int contentSizeFlag; /**< 1: content size will be in frame header (when known) */
+    int checksumFlag;    /**< 1: generate a 32-bits checksum using XXH64 algorithm at end of frame, for error detection */
+    int noDictIDFlag;    /**< 1: no dictID will be saved into frame header (dictID is only useful for dictionary compression) */
 } ZSTD_frameParameters;
 
 typedef struct {
@@ -464,33 +1005,70 @@
     ZSTD_frameParameters fParams;
 } ZSTD_parameters;
 
-typedef struct ZSTD_CCtx_params_s ZSTD_CCtx_params;
-
 typedef enum {
-    ZSTD_dct_auto=0,      /* dictionary is "full" when starting with ZSTD_MAGIC_DICTIONARY, otherwise it is "rawContent" */
-    ZSTD_dct_rawContent,  /* ensures dictionary is always loaded as rawContent, even if it starts with ZSTD_MAGIC_DICTIONARY */
-    ZSTD_dct_fullDict     /* refuses to load a dictionary if it does not respect Zstandard's specification */
+    ZSTD_dct_auto = 0,       /* dictionary is "full" when starting with ZSTD_MAGIC_DICTIONARY, otherwise it is "rawContent" */
+    ZSTD_dct_rawContent = 1, /* ensures dictionary is always loaded as rawContent, even if it starts with ZSTD_MAGIC_DICTIONARY */
+    ZSTD_dct_fullDict = 2    /* refuses to load a dictionary if it does not respect Zstandard's specification, starting with ZSTD_MAGIC_DICTIONARY */
 } ZSTD_dictContentType_e;
 
 typedef enum {
-    ZSTD_dlm_byCopy = 0, /**< Copy dictionary content internally */
-    ZSTD_dlm_byRef,      /**< Reference dictionary content -- the dictionary buffer must outlive its users. */
+    ZSTD_dlm_byCopy = 0,  /**< Copy dictionary content internally */
+    ZSTD_dlm_byRef = 1,   /**< Reference dictionary content -- the dictionary buffer must outlive its users. */
 } ZSTD_dictLoadMethod_e;
 
+typedef enum {
+    /* Opened question : should we have a format ZSTD_f_auto ?
+     * Today, it would mean exactly the same as ZSTD_f_zstd1.
+     * But, in the future, should several formats become supported,
+     * on the compression side, it would mean "default format".
+     * On the decompression side, it would mean "automatic format detection",
+     * so that ZSTD_f_zstd1 would mean "accept *only* zstd frames".
+     * Since meaning is a little different, another option could be to define different enums for compression and decompression.
+     * This question could be kept for later, when there are actually multiple formats to support,
+     * but there is also the question of pinning enum values, and pinning value `0` is especially important */
+    ZSTD_f_zstd1 = 0,           /* zstd frame format, specified in zstd_compression_format.md (default) */
+    ZSTD_f_zstd1_magicless = 1, /* Variant of zstd frame format, without initial 4-bytes magic number.
+                                 * Useful to save 4 bytes per generated frame.
+                                 * Decoder cannot recognise automatically this format, requiring this instruction. */
+} ZSTD_format_e;
+
+typedef enum {
+    /* Note: this enum and the behavior it controls are effectively internal
+     * implementation details of the compressor. They are expected to continue
+     * to evolve and should be considered only in the context of extremely
+     * advanced performance tuning.
+     *
+     * Zstd currently supports the use of a CDict in two ways:
+     *
+     * - The contents of the CDict can be copied into the working context. This
+     *   means that the compression can search both the dictionary and input
+     *   while operating on a single set of internal tables. This makes
+     *   the compression faster per-byte of input. However, the initial copy of
+     *   the CDict's tables incurs a fixed cost at the beginning of the
+     *   compression. For small compressions (< 8 KB), that copy can dominate
+     *   the cost of the compression.
+     *
+     * - The CDict's tables can be used in-place. In this model, compression is
+     *   slower per input byte, because the compressor has to search two sets of
+     *   tables. However, this model incurs no start-up cost (as long as the
+     *   working context's tables can be reused). For small inputs, this can be
+     *   faster than copying the CDict's tables.
+     *
+     * Zstd has a simple internal heuristic that selects which strategy to use
+     * at the beginning of a compression. However, if experimentation shows that
+     * Zstd is making poor choices, it is possible to override that choice with
+     * this enum.
+     */
+    ZSTD_dictDefaultAttach = 0, /* Use the default heuristic. */
+    ZSTD_dictForceAttach   = 1, /* Never copy the dictionary. */
+    ZSTD_dictForceCopy     = 2, /* Always copy the dictionary. */
+} ZSTD_dictAttachPref_e;
 
 
 /***************************************
 *  Frame size functions
 ***************************************/
 
-/*! ZSTD_findFrameCompressedSize() :
- *  `src` should point to the start of a ZSTD encoded frame or skippable frame
- *  `srcSize` must be >= first frame size
- *  @return : the compressed size of the first frame starting at `src`,
- *            suitable to pass to `ZSTD_decompress` or similar,
- *            or an error code if input is invalid */
-ZSTDLIB_API size_t ZSTD_findFrameCompressedSize(const void* src, size_t srcSize);
-
 /*! ZSTD_findDecompressedSize() :
  *  `src` should point the start of a series of ZSTD encoded and/or skippable frames
  *  `srcSize` must be the _exact_ size of this series
@@ -515,7 +1093,7 @@
 ZSTDLIB_API unsigned long long ZSTD_findDecompressedSize(const void* src, size_t srcSize);
 
 /*! ZSTD_frameHeaderSize() :
- *  srcSize must be >= ZSTD_frameHeaderSize_prefix.
+ *  srcSize must be >= ZSTD_FRAMEHEADERSIZE_PREFIX.
  * @return : size of the Frame Header,
  *           or an error code (if srcSize is too small) */
 ZSTDLIB_API size_t ZSTD_frameHeaderSize(const void* src, size_t srcSize);
@@ -525,16 +1103,6 @@
 *  Memory management
 ***************************************/
 
-/*! ZSTD_sizeof_*() :
- *  These functions give the current memory usage of selected object.
- *  Object memory usage can evolve when re-used. */
-ZSTDLIB_API size_t ZSTD_sizeof_CCtx(const ZSTD_CCtx* cctx);
-ZSTDLIB_API size_t ZSTD_sizeof_DCtx(const ZSTD_DCtx* dctx);
-ZSTDLIB_API size_t ZSTD_sizeof_CStream(const ZSTD_CStream* zcs);
-ZSTDLIB_API size_t ZSTD_sizeof_DStream(const ZSTD_DStream* zds);
-ZSTDLIB_API size_t ZSTD_sizeof_CDict(const ZSTD_CDict* cdict);
-ZSTDLIB_API size_t ZSTD_sizeof_DDict(const ZSTD_DDict* ddict);
-
 /*! ZSTD_estimate*() :
  *  These functions make it possible to estimate memory usage
  *  of a future {D,C}Ctx, before its creation.
@@ -542,7 +1110,7 @@
  *  It will also consider src size to be arbitrarily "large", which is worst case.
  *  If srcSize is known to always be small, ZSTD_estimateCCtxSize_usingCParams() can provide a tighter estimation.
  *  ZSTD_estimateCCtxSize_usingCParams() can be used in tandem with ZSTD_getCParams() to create cParams from compressionLevel.
- *  ZSTD_estimateCCtxSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParam_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_p_nbWorkers is >= 1.
+ *  ZSTD_estimateCCtxSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParam_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_c_nbWorkers is >= 1.
  *  Note : CCtx size estimation is only correct for single-threaded compression. */
 ZSTDLIB_API size_t ZSTD_estimateCCtxSize(int compressionLevel);
 ZSTDLIB_API size_t ZSTD_estimateCCtxSize_usingCParams(ZSTD_compressionParameters cParams);
@@ -554,7 +1122,7 @@
  *  It will also consider src size to be arbitrarily "large", which is worst case.
  *  If srcSize is known to always be small, ZSTD_estimateCStreamSize_usingCParams() can provide a tighter estimation.
  *  ZSTD_estimateCStreamSize_usingCParams() can be used in tandem with ZSTD_getCParams() to create cParams from compressionLevel.
- *  ZSTD_estimateCStreamSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParam_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_p_nbWorkers is >= 1.
+ *  ZSTD_estimateCStreamSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParam_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_c_nbWorkers is >= 1.
  *  Note : CStream size estimation is only correct for single-threaded compression.
  *  ZSTD_DStream memory budget depends on window Size.
  *  This information can be passed manually, using ZSTD_estimateDStreamSize,
@@ -617,6 +1185,7 @@
                                         ZSTD_dictLoadMethod_e dictLoadMethod,
                                         ZSTD_dictContentType_e dictContentType);
 
+
 /*! Custom memory allocation :
  *  These prototypes make it possible to pass your own allocation/free functions.
  *  ZSTD_customMem is provided at creation time, using ZSTD_create*_advanced() variants listed below.
@@ -651,8 +1220,9 @@
 
 /*! ZSTD_createCDict_byReference() :
  *  Create a digested dictionary for compression
- *  Dictionary content is simply referenced, and therefore stays in dictBuffer.
- *  It is important that dictBuffer outlives CDict, it must remain read accessible throughout the lifetime of CDict */
+ *  Dictionary content is just referenced, not duplicated.
+ *  As a consequence, `dictBuffer` **must** outlive CDict,
+ *  and its content must remain unmodified throughout the lifetime of CDict. */
 ZSTDLIB_API ZSTD_CDict* ZSTD_createCDict_byReference(const void* dictBuffer, size_t dictSize, int compressionLevel);
 
 /*! ZSTD_getCParams() :
@@ -675,22 +1245,161 @@
 ZSTDLIB_API ZSTD_compressionParameters ZSTD_adjustCParams(ZSTD_compressionParameters cPar, unsigned long long srcSize, size_t dictSize);
 
 /*! ZSTD_compress_advanced() :
-*   Same as ZSTD_compress_usingDict(), with fine-tune control over each compression parameter */
-ZSTDLIB_API size_t ZSTD_compress_advanced (ZSTD_CCtx* cctx,
-                                  void* dst, size_t dstCapacity,
-                            const void* src, size_t srcSize,
-                            const void* dict,size_t dictSize,
-                                  ZSTD_parameters params);
+ *  Same as ZSTD_compress_usingDict(), with fine-tune control over compression parameters (by structure) */
+ZSTDLIB_API size_t ZSTD_compress_advanced(ZSTD_CCtx* cctx,
+                                          void* dst, size_t dstCapacity,
+                                    const void* src, size_t srcSize,
+                                    const void* dict,size_t dictSize,
+                                          ZSTD_parameters params);
 
 /*! ZSTD_compress_usingCDict_advanced() :
-*   Same as ZSTD_compress_usingCDict(), with fine-tune control over frame parameters */
+ *  Same as ZSTD_compress_usingCDict(), with fine-tune control over frame parameters */
 ZSTDLIB_API size_t ZSTD_compress_usingCDict_advanced(ZSTD_CCtx* cctx,
-                                  void* dst, size_t dstCapacity,
-                            const void* src, size_t srcSize,
-                            const ZSTD_CDict* cdict, ZSTD_frameParameters fParams);
+                                              void* dst, size_t dstCapacity,
+                                        const void* src, size_t srcSize,
+                                        const ZSTD_CDict* cdict,
+                                              ZSTD_frameParameters fParams);
+
+
+/*! ZSTD_CCtx_loadDictionary_byReference() :
+ *  Same as ZSTD_CCtx_loadDictionary(), but dictionary content is referenced, instead of being copied into CCtx.
+ *  It saves some memory, but also requires that `dict` outlives its usage within `cctx` */
+ZSTDLIB_API size_t ZSTD_CCtx_loadDictionary_byReference(ZSTD_CCtx* cctx, const void* dict, size_t dictSize);
+
+/*! ZSTD_CCtx_loadDictionary_advanced() :
+ *  Same as ZSTD_CCtx_loadDictionary(), but gives finer control over
+ *  how to load the dictionary (by copy ? by reference ?)
+ *  and how to interpret it (automatic ? force raw mode ? full mode only ?) */
+ZSTDLIB_API size_t ZSTD_CCtx_loadDictionary_advanced(ZSTD_CCtx* cctx, const void* dict, size_t dictSize, ZSTD_dictLoadMethod_e dictLoadMethod, ZSTD_dictContentType_e dictContentType);
+
+/*! ZSTD_CCtx_refPrefix_advanced() :
+ *  Same as ZSTD_CCtx_refPrefix(), but gives finer control over
+ *  how to interpret prefix content (automatic ? force raw mode (default) ? full mode only ?) */
+ZSTDLIB_API size_t ZSTD_CCtx_refPrefix_advanced(ZSTD_CCtx* cctx, const void* prefix, size_t prefixSize, ZSTD_dictContentType_e dictContentType);
+
+/* ===   experimental parameters   === */
+/* these parameters can be used with ZSTD_setParameter()
+ * they are not guaranteed to remain supported in the future */
+
+ /* Enables rsyncable mode,
+  * which makes compressed files more rsync friendly
+  * by adding periodic synchronization points to the compressed data.
+  * The target average block size is ZSTD_c_jobSize / 2.
+  * It's possible to modify the job size to increase or decrease
+  * the granularity of the synchronization point.
+  * Once the jobSize is smaller than the window size,
+  * it will result in compression ratio degradation.
+  * NOTE 1: rsyncable mode only works when multithreading is enabled.
+  * NOTE 2: rsyncable performs poorly in combination with long range mode,
+  * since it will decrease the effectiveness of synchronization points,
+  * though mileage may vary.
+  * NOTE 3: Rsyncable mode limits maximum compression speed to ~400 MB/s.
+  * If the selected compression level is already running significantly slower,
+  * the overall speed won't be significantly impacted.
+  */
+ #define ZSTD_c_rsyncable ZSTD_c_experimentalParam1
+
+/* Select a compression format.
+ * The value must be of type ZSTD_format_e.
+ * See ZSTD_format_e enum definition for details */
+#define ZSTD_c_format ZSTD_c_experimentalParam2
+
+/* Force back-reference distances to remain < windowSize,
+ * even when referencing into Dictionary content (default:0) */
+#define ZSTD_c_forceMaxWindow ZSTD_c_experimentalParam3
+
+/* Controls whether the contents of a CDict
+ * are used in place, or copied into the working context.
+ * Accepts values from the ZSTD_dictAttachPref_e enum.
+ * See the comments on that enum for an explanation of the feature. */
+#define ZSTD_c_forceAttachDict ZSTD_c_experimentalParam4
+
+/*! ZSTD_CCtx_getParameter() :
+ *  Get the requested compression parameter value, selected by enum ZSTD_cParameter,
+ *  and store it into int* value.
+ * @return : 0, or an error code (which can be tested with ZSTD_isError()).
+ */
+ZSTDLIB_API size_t ZSTD_CCtx_getParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, int* value);
 
 
-/*--- Advanced decompression functions ---*/
+/*! ZSTD_CCtx_params :
+ *  Quick howto :
+ *  - ZSTD_createCCtxParams() : Create a ZSTD_CCtx_params structure
+ *  - ZSTD_CCtxParam_setParameter() : Push parameters one by one into
+ *                                    an existing ZSTD_CCtx_params structure.
+ *                                    This is similar to
+ *                                    ZSTD_CCtx_setParameter().
+ *  - ZSTD_CCtx_setParametersUsingCCtxParams() : Apply parameters to
+ *                                    an existing CCtx.
+ *                                    These parameters will be applied to
+ *                                    all subsequent frames.
+ *  - ZSTD_compressStream2() : Do compression using the CCtx.
+ *  - ZSTD_freeCCtxParams() : Free the memory.
+ *
+ *  This can be used with ZSTD_estimateCCtxSize_advanced_usingCCtxParams()
+ *  for static allocation of CCtx for single-threaded compression.
+ */
+ZSTDLIB_API ZSTD_CCtx_params* ZSTD_createCCtxParams(void);
+ZSTDLIB_API size_t ZSTD_freeCCtxParams(ZSTD_CCtx_params* params);
+
+/*! ZSTD_CCtxParams_reset() :
+ *  Reset params to default values.
+ */
+ZSTDLIB_API size_t ZSTD_CCtxParams_reset(ZSTD_CCtx_params* params);
+
+/*! ZSTD_CCtxParams_init() :
+ *  Initializes the compression parameters of cctxParams according to
+ *  compression level. All other parameters are reset to their default values.
+ */
+ZSTDLIB_API size_t ZSTD_CCtxParams_init(ZSTD_CCtx_params* cctxParams, int compressionLevel);
+
+/*! ZSTD_CCtxParams_init_advanced() :
+ *  Initializes the compression and frame parameters of cctxParams according to
+ *  params. All other parameters are reset to their default values.
+ */
+ZSTDLIB_API size_t ZSTD_CCtxParams_init_advanced(ZSTD_CCtx_params* cctxParams, ZSTD_parameters params);
+
+/*! ZSTD_CCtxParam_setParameter() :
+ *  Similar to ZSTD_CCtx_setParameter.
+ *  Set one compression parameter, selected by enum ZSTD_cParameter.
+ *  Parameters must be applied to a ZSTD_CCtx using ZSTD_CCtx_setParametersUsingCCtxParams().
+ * @result : 0, or an error code (which can be tested with ZSTD_isError()).
+ */
+ZSTDLIB_API size_t ZSTD_CCtxParam_setParameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, int value);
+
+/*! ZSTD_CCtxParam_getParameter() :
+ * Similar to ZSTD_CCtx_getParameter.
+ * Get the requested value of one compression parameter, selected by enum ZSTD_cParameter.
+ * @result : 0, or an error code (which can be tested with ZSTD_isError()).
+ */
+ZSTDLIB_API size_t ZSTD_CCtxParam_getParameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, int* value);
+
+/*! ZSTD_CCtx_setParametersUsingCCtxParams() :
+ *  Apply a set of ZSTD_CCtx_params to the compression context.
+ *  This can be done even after compression is started,
+ *    if nbWorkers==0, this will have no impact until a new compression is started.
+ *    if nbWorkers>=1, new parameters will be picked up at next job,
+ *       with a few restrictions (windowLog, pledgedSrcSize, nbWorkers, jobSize, and overlapLog are not updated).
+ */
+ZSTDLIB_API size_t ZSTD_CCtx_setParametersUsingCCtxParams(
+        ZSTD_CCtx* cctx, const ZSTD_CCtx_params* params);
+
+/*! ZSTD_compressStream2_simpleArgs() :
+ *  Same as ZSTD_compressStream2(),
+ *  but using only integral types as arguments.
+ *  This variant might be helpful for binders from dynamic languages
+ *  which have troubles handling structures containing memory pointers.
+ */
+ZSTDLIB_API size_t ZSTD_compressStream2_simpleArgs (
+                            ZSTD_CCtx* cctx,
+                            void* dst, size_t dstCapacity, size_t* dstPos,
+                      const void* src, size_t srcSize, size_t* srcPos,
+                            ZSTD_EndDirective endOp);
+
+
+/***************************************
+*  Advanced decompression functions
+***************************************/
 
 /*! ZSTD_isFrame() :
  *  Tells if the content of `buffer` starts with a valid Frame Identifier.
@@ -731,9 +1440,64 @@
  *  When identifying the exact failure cause, it's possible to use ZSTD_getFrameHeader(), which will provide a more precise error code. */
 ZSTDLIB_API unsigned ZSTD_getDictID_fromFrame(const void* src, size_t srcSize);
 
+/*! ZSTD_DCtx_loadDictionary_byReference() :
+ *  Same as ZSTD_DCtx_loadDictionary(),
+ *  but references `dict` content instead of copying it into `dctx`.
+ *  This saves memory if `dict` remains around.,
+ *  However, it's imperative that `dict` remains accessible (and unmodified) while being used, so it must outlive decompression. */
+ZSTDLIB_API size_t ZSTD_DCtx_loadDictionary_byReference(ZSTD_DCtx* dctx, const void* dict, size_t dictSize);
+
+/*! ZSTD_DCtx_loadDictionary_advanced() :
+ *  Same as ZSTD_DCtx_loadDictionary(),
+ *  but gives direct control over
+ *  how to load the dictionary (by copy ? by reference ?)
+ *  and how to interpret it (automatic ? force raw mode ? full mode only ?). */
+ZSTDLIB_API size_t ZSTD_DCtx_loadDictionary_advanced(ZSTD_DCtx* dctx, const void* dict, size_t dictSize, ZSTD_dictLoadMethod_e dictLoadMethod, ZSTD_dictContentType_e dictContentType);
+
+/*! ZSTD_DCtx_refPrefix_advanced() :
+ *  Same as ZSTD_DCtx_refPrefix(), but gives finer control over
+ *  how to interpret prefix content (automatic ? force raw mode (default) ? full mode only ?) */
+ZSTDLIB_API size_t ZSTD_DCtx_refPrefix_advanced(ZSTD_DCtx* dctx, const void* prefix, size_t prefixSize, ZSTD_dictContentType_e dictContentType);
+
+/*! ZSTD_DCtx_setMaxWindowSize() :
+ *  Refuses allocating internal buffers for frames requiring a window size larger than provided limit.
+ *  This protects a decoder context from reserving too much memory for itself (potential attack scenario).
+ *  This parameter is only useful in streaming mode, since no internal buffer is allocated in single-pass mode.
+ *  By default, a decompression context accepts all window sizes <= (1 << ZSTD_WINDOWLOG_LIMIT_DEFAULT)
+ * @return : 0, or an error code (which can be tested using ZSTD_isError()).
+ */
+ZSTDLIB_API size_t ZSTD_DCtx_setMaxWindowSize(ZSTD_DCtx* dctx, size_t maxWindowSize);
+
+/* ZSTD_d_format
+ * experimental parameter,
+ * allowing selection between ZSTD_format_e input compression formats
+ */
+#define ZSTD_d_format ZSTD_d_experimentalParam1
+
+/*! ZSTD_DCtx_setFormat() :
+ *  Instruct the decoder context about what kind of data to decode next.
+ *  This instruction is mandatory to decode data without a fully-formed header,
+ *  such ZSTD_f_zstd1_magicless for example.
+ * @return : 0, or an error code (which can be tested using ZSTD_isError()). */
+ZSTDLIB_API size_t ZSTD_DCtx_setFormat(ZSTD_DCtx* dctx, ZSTD_format_e format);
+
+/*! ZSTD_decompressStream_simpleArgs() :
+ *  Same as ZSTD_decompressStream(),
+ *  but using only integral types as arguments.
+ *  This can be helpful for binders from dynamic languages
+ *  which have troubles handling structures containing memory pointers.
+ */
+ZSTDLIB_API size_t ZSTD_decompressStream_simpleArgs (
+                            ZSTD_DCtx* dctx,
+                            void* dst, size_t dstCapacity, size_t* dstPos,
+                      const void* src, size_t srcSize, size_t* srcPos);
+
 
 /********************************************************************
 *  Advanced streaming functions
+*  Warning : most of these functions are now redundant with the Advanced API.
+*  Once Advanced API reaches "stable" status,
+*  redundant functions will be deprecated, and then at some point removed.
 ********************************************************************/
 
 /*=====   Advanced Streaming compression functions  =====*/
@@ -745,7 +1509,7 @@
 ZSTDLIB_API size_t ZSTD_initCStream_usingCDict_advanced(ZSTD_CStream* zcs, const ZSTD_CDict* cdict, ZSTD_frameParameters fParams, unsigned long long pledgedSrcSize);  /**< same as ZSTD_initCStream_usingCDict(), with control over frame parameters. pledgedSrcSize must be correct. If srcSize is not known at init time, use value ZSTD_CONTENTSIZE_UNKNOWN. */
 
 /*! ZSTD_resetCStream() :
- *  start a new compression job, using same parameters from previous job.
+ *  start a new frame, using same parameters from previous frame.
  *  This is typically useful to skip dictionary loading stage, since it will re-use it in-place.
  *  Note that zcs must be init at least once before using ZSTD_resetCStream().
  *  If pledgedSrcSize is not known at reset time, use macro ZSTD_CONTENTSIZE_UNKNOWN.
@@ -784,16 +1548,13 @@
  *  + there is no active job (could be checked with ZSTD_frameProgression()), or
  *  + oldest job is still actively compressing data,
  *    but everything it has produced has also been flushed so far,
- *    therefore flushing speed is currently limited by production speed of oldest job
- *    irrespective of the speed of concurrent newer jobs.
+ *    therefore flush speed is limited by production speed of oldest job
+ *    irrespective of the speed of concurrent (and newer) jobs.
  */
 ZSTDLIB_API size_t ZSTD_toFlushNow(ZSTD_CCtx* cctx);
 
 
-
 /*=====   Advanced Streaming decompression functions  =====*/
-typedef enum { DStream_p_maxWindowSize } ZSTD_DStreamParameter_e;
-ZSTDLIB_API size_t ZSTD_setDStreamParameter(ZSTD_DStream* zds, ZSTD_DStreamParameter_e paramType, unsigned paramValue);   /* obsolete : this API will be removed in a future version */
 ZSTDLIB_API size_t ZSTD_initDStream_usingDict(ZSTD_DStream* zds, const void* dict, size_t dictSize); /**< note: no dictionary will be used if dict == NULL or dictSize < 8 */
 ZSTDLIB_API size_t ZSTD_initDStream_usingDDict(ZSTD_DStream* zds, const ZSTD_DDict* ddict);  /**< note : ddict is referenced, it must outlive decompression session */
 ZSTDLIB_API size_t ZSTD_resetDStream(ZSTD_DStream* zds);  /**< re-use decompression parameters from previous init; saves dictionary loading */
@@ -934,12 +1695,17 @@
     unsigned dictID;
     unsigned checksumFlag;
 } ZSTD_frameHeader;
+
 /** ZSTD_getFrameHeader() :
  *  decode Frame Header, or requires larger `srcSize`.
  * @return : 0, `zfhPtr` is correctly filled,
  *          >0, `srcSize` is too small, value is wanted `srcSize` amount,
  *           or an error code, which can be tested using ZSTD_isError() */
 ZSTDLIB_API size_t ZSTD_getFrameHeader(ZSTD_frameHeader* zfhPtr, const void* src, size_t srcSize);   /**< doesn't consume input */
+/*! ZSTD_getFrameHeader_advanced() :
+ *  same as ZSTD_getFrameHeader(),
+ *  with added capability to select a format (like ZSTD_f_zstd1_magicless) */
+ZSTDLIB_API size_t ZSTD_getFrameHeader_advanced(ZSTD_frameHeader* zfhPtr, const void* src, size_t srcSize, ZSTD_format_e format);
 ZSTDLIB_API size_t ZSTD_decodingBufferSize_min(unsigned long long windowSize, unsigned long long frameContentSize);  /**< when frame content size is not known, pass in frameContentSize == ZSTD_CONTENTSIZE_UNKNOWN */
 
 ZSTDLIB_API size_t ZSTD_decompressBegin(ZSTD_DCtx* dctx);
@@ -956,522 +1722,6 @@
 
 
 
-/* ============================================ */
-/**       New advanced API (experimental)       */
-/* ============================================ */
-
-/* API design :
- *   In this advanced API, parameters are pushed one by one into an existing context,
- *   using ZSTD_CCtx_set*() functions.
- *   Pushed parameters are sticky : they are applied to next job, and any subsequent job.
- *   It's possible to reset parameters to "default" using ZSTD_CCtx_reset().
- *   Important : "sticky" parameters only work with `ZSTD_compress_generic()` !
- *               For any other entry point, "sticky" parameters are ignored !
- *
- *   This API is intended to replace all others advanced / experimental API entry points.
- */
-
-/* note on enum design :
- * All enum will be pinned to explicit values before reaching "stable API" status */
-
-typedef enum {
-    /* Opened question : should we have a format ZSTD_f_auto ?
-     * Today, it would mean exactly the same as ZSTD_f_zstd1.
-     * But, in the future, should several formats become supported,
-     * on the compression side, it would mean "default format".
-     * On the decompression side, it would mean "automatic format detection",
-     * so that ZSTD_f_zstd1 would mean "accept *only* zstd frames".
-     * Since meaning is a little different, another option could be to define different enums for compression and decompression.
-     * This question could be kept for later, when there are actually multiple formats to support,
-     * but there is also the question of pinning enum values, and pinning value `0` is especially important */
-    ZSTD_f_zstd1 = 0,        /* zstd frame format, specified in zstd_compression_format.md (default) */
-    ZSTD_f_zstd1_magicless,  /* Variant of zstd frame format, without initial 4-bytes magic number.
-                              * Useful to save 4 bytes per generated frame.
-                              * Decoder cannot recognise automatically this format, requiring instructions. */
-} ZSTD_format_e;
-
-typedef enum {
-    /* compression format */
-    ZSTD_p_format = 10,      /* See ZSTD_format_e enum definition.
-                              * Cast selected format as unsigned for ZSTD_CCtx_setParameter() compatibility. */
-
-    /* compression parameters */
-    ZSTD_p_compressionLevel=100, /* Update all compression parameters according to pre-defined cLevel table
-                              * Default level is ZSTD_CLEVEL_DEFAULT==3.
-                              * Special: value 0 means default, which is controlled by ZSTD_CLEVEL_DEFAULT.
-                              * Note 1 : it's possible to pass a negative compression level by casting it to unsigned type.
-                              * Note 2 : setting a level sets all default values of other compression parameters.
-                              * Note 3 : setting compressionLevel automatically updates ZSTD_p_compressLiterals. */
-    ZSTD_p_windowLog,        /* Maximum allowed back-reference distance, expressed as power of 2.
-                              * Must be clamped between ZSTD_WINDOWLOG_MIN and ZSTD_WINDOWLOG_MAX.
-                              * Special: value 0 means "use default windowLog".
-                              * Note: Using a window size greater than ZSTD_MAXWINDOWSIZE_DEFAULT (default: 2^27)
-                              *       requires explicitly allowing such window size during decompression stage. */
-    ZSTD_p_hashLog,          /* Size of the initial probe table, as a power of 2.
-                              * Resulting table size is (1 << (hashLog+2)).
-                              * Must be clamped between ZSTD_HASHLOG_MIN and ZSTD_HASHLOG_MAX.
-                              * Larger tables improve compression ratio of strategies <= dFast,
-                              * and improve speed of strategies > dFast.
-                              * Special: value 0 means "use default hashLog". */
-    ZSTD_p_chainLog,         /* Size of the multi-probe search table, as a power of 2.
-                              * Resulting table size is (1 << (chainLog+2)).
-                              * Must be clamped between ZSTD_CHAINLOG_MIN and ZSTD_CHAINLOG_MAX.
-                              * Larger tables result in better and slower compression.
-                              * This parameter is useless when using "fast" strategy.
-                              * Note it's still useful when using "dfast" strategy,
-                              * in which case it defines a secondary probe table.
-                              * Special: value 0 means "use default chainLog". */
-    ZSTD_p_searchLog,        /* Number of search attempts, as a power of 2.
-                              * More attempts result in better and slower compression.
-                              * This parameter is useless when using "fast" and "dFast" strategies.
-                              * Special: value 0 means "use default searchLog". */
-    ZSTD_p_minMatch,         /* Minimum size of searched matches (note : repCode matches can be smaller).
-                              * Larger values make faster compression and decompression, but decrease ratio.
-                              * Must be clamped between ZSTD_SEARCHLENGTH_MIN and ZSTD_SEARCHLENGTH_MAX.
-                              * Note that currently, for all strategies < btopt, effective minimum is 4.
-                              *                    , for all strategies > fast, effective maximum is 6.
-                              * Special: value 0 means "use default minMatchLength". */
-    ZSTD_p_targetLength,     /* Impact of this field depends on strategy.
-                              * For strategies btopt & btultra:
-                              *     Length of Match considered "good enough" to stop search.
-                              *     Larger values make compression stronger, and slower.
-                              * For strategy fast:
-                              *     Distance between match sampling.
-                              *     Larger values make compression faster, and weaker.
-                              * Special: value 0 means "use default targetLength". */
-    ZSTD_p_compressionStrategy, /* See ZSTD_strategy enum definition.
-                              * Cast selected strategy as unsigned for ZSTD_CCtx_setParameter() compatibility.
-                              * The higher the value of selected strategy, the more complex it is,
-                              * resulting in stronger and slower compression.
-                              * Special: value 0 means "use default strategy". */
-
-    ZSTD_p_enableLongDistanceMatching=160, /* Enable long distance matching.
-                                         * This parameter is designed to improve compression ratio
-                                         * for large inputs, by finding large matches at long distance.
-                                         * It increases memory usage and window size.
-                                         * Note: enabling this parameter increases ZSTD_p_windowLog to 128 MB
-                                         * except when expressly set to a different value. */
-    ZSTD_p_ldmHashLog,       /* Size of the table for long distance matching, as a power of 2.
-                              * Larger values increase memory usage and compression ratio,
-                              * but decrease compression speed.
-                              * Must be clamped between ZSTD_HASHLOG_MIN and ZSTD_HASHLOG_MAX
-                              * default: windowlog - 7.
-                              * Special: value 0 means "automatically determine hashlog". */
-    ZSTD_p_ldmMinMatch,      /* Minimum match size for long distance matcher.
-                              * Larger/too small values usually decrease compression ratio.
-                              * Must be clamped between ZSTD_LDM_MINMATCH_MIN and ZSTD_LDM_MINMATCH_MAX.
-                              * Special: value 0 means "use default value" (default: 64). */
-    ZSTD_p_ldmBucketSizeLog, /* Log size of each bucket in the LDM hash table for collision resolution.
-                              * Larger values improve collision resolution but decrease compression speed.
-                              * The maximum value is ZSTD_LDM_BUCKETSIZELOG_MAX .
-                              * Special: value 0 means "use default value" (default: 3). */
-    ZSTD_p_ldmHashEveryLog,  /* Frequency of inserting/looking up entries in the LDM hash table.
-                              * Must be clamped between 0 and (ZSTD_WINDOWLOG_MAX - ZSTD_HASHLOG_MIN).
-                              * Default is MAX(0, (windowLog - ldmHashLog)), optimizing hash table usage.
-                              * Larger values improve compression speed.
-                              * Deviating far from default value will likely result in a compression ratio decrease.
-                              * Special: value 0 means "automatically determine hashEveryLog". */
-
-    /* frame parameters */
-    ZSTD_p_contentSizeFlag=200, /* Content size will be written into frame header _whenever known_ (default:1)
-                              * Content size must be known at the beginning of compression,
-                              * it is provided using ZSTD_CCtx_setPledgedSrcSize() */
-    ZSTD_p_checksumFlag,     /* A 32-bits checksum of content is written at end of frame (default:0) */
-    ZSTD_p_dictIDFlag,       /* When applicable, dictionary's ID is written into frame header (default:1) */
-
-    /* multi-threading parameters */
-    /* These parameters are only useful if multi-threading is enabled (ZSTD_MULTITHREAD).
-     * They return an error otherwise. */
-    ZSTD_p_nbWorkers=400,    /* Select how many threads will be spawned to compress in parallel.
-                              * When nbWorkers >= 1, triggers asynchronous mode :
-                              * ZSTD_compress_generic() consumes some input, flush some output if possible, and immediately gives back control to caller,
-                              * while compression work is performed in parallel, within worker threads.
-                              * (note : a strong exception to this rule is when first invocation sets ZSTD_e_end : it becomes a blocking call).
-                              * More workers improve speed, but also increase memory usage.
-                              * Default value is `0`, aka "single-threaded mode" : no worker is spawned, compression is performed inside Caller's thread, all invocations are blocking */
-    ZSTD_p_jobSize,          /* Size of a compression job. This value is enforced only in non-blocking mode.
-                              * Each compression job is completed in parallel, so this value indirectly controls the nb of active threads.
-                              * 0 means default, which is dynamically determined based on compression parameters.
-                              * Job size must be a minimum of overlapSize, or 1 MB, whichever is largest.
-                              * The minimum size is automatically and transparently enforced */
-    ZSTD_p_overlapSizeLog,   /* Size of previous input reloaded at the beginning of each job.
-                              * 0 => no overlap, 6(default) => use 1/8th of windowSize, >=9 => use full windowSize */
-
-    /* =================================================================== */
-    /* experimental parameters - no stability guaranteed                   */
-    /* =================================================================== */
-
-    ZSTD_p_forceMaxWindow=1100, /* Force back-reference distances to remain < windowSize,
-                              * even when referencing into Dictionary content (default:0) */
-    ZSTD_p_forceAttachDict,  /* ZSTD supports usage of a CDict in-place
-                              * (avoiding having to copy the compression tables
-                              * from the CDict into the working context). Using
-                              * a CDict in this way saves an initial setup step,
-                              * but comes at the cost of more work per byte of
-                              * input. ZSTD has a simple internal heuristic that
-                              * guesses which strategy will be faster. You can
-                              * use this flag to override that guess.
-                              *
-                              * Note that the by-reference, in-place strategy is
-                              * only used when reusing a compression context
-                              * with compatible compression parameters. (If
-                              * incompatible / uninitialized, the working
-                              * context needs to be cleared anyways, which is
-                              * about as expensive as overwriting it with the
-                              * dictionary context, so there's no savings in
-                              * using the CDict by-ref.)
-                              *
-                              * Values greater than 0 force attaching the dict.
-                              * Values less than 0 force copying the dict.
-                              * 0 selects the default heuristic-guided behavior.
-                              */
-
-} ZSTD_cParameter;
-
-
-/*! ZSTD_CCtx_setParameter() :
- *  Set one compression parameter, selected by enum ZSTD_cParameter.
- *  Setting a parameter is generally only possible during frame initialization (before starting compression).
- *  Exception : when using multi-threading mode (nbThreads >= 1),
- *              following parameters can be updated _during_ compression (within same frame):
- *              => compressionLevel, hashLog, chainLog, searchLog, minMatch, targetLength and strategy.
- *              new parameters will be active on next job, or after a flush().
- *  Note : when `value` type is not unsigned (int, or enum), cast it to unsigned for proper type checking.
- *  @result : informational value (typically, value being set, correctly clamped),
- *            or an error code (which can be tested with ZSTD_isError()). */
-ZSTDLIB_API size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, unsigned value);
-
-/*! ZSTD_CCtx_getParameter() :
- * Get the requested value of one compression parameter, selected by enum ZSTD_cParameter.
- * @result : 0, or an error code (which can be tested with ZSTD_isError()).
- */
-ZSTDLIB_API size_t ZSTD_CCtx_getParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, unsigned* value);
-
-/*! ZSTD_CCtx_setPledgedSrcSize() :
- *  Total input data size to be compressed as a single frame.
- *  This value will be controlled at the end, and result in error if not respected.
- * @result : 0, or an error code (which can be tested with ZSTD_isError()).
- *  Note 1 : 0 means zero, empty.
- *           In order to mean "unknown content size", pass constant ZSTD_CONTENTSIZE_UNKNOWN.
- *           ZSTD_CONTENTSIZE_UNKNOWN is default value for any new compression job.
- *  Note 2 : If all data is provided and consumed in a single round,
- *           this value is overriden by srcSize instead. */
-ZSTDLIB_API size_t ZSTD_CCtx_setPledgedSrcSize(ZSTD_CCtx* cctx, unsigned long long pledgedSrcSize);
-
-/*! ZSTD_CCtx_loadDictionary() :
- *  Create an internal CDict from `dict` buffer.
- *  Decompression will have to use same dictionary.
- * @result : 0, or an error code (which can be tested with ZSTD_isError()).
- *  Special: Adding a NULL (or 0-size) dictionary invalidates previous dictionary,
- *           meaning "return to no-dictionary mode".
- *  Note 1 : Dictionary will be used for all future compression jobs.
- *           To return to "no-dictionary" situation, load a NULL dictionary
- *  Note 2 : Loading a dictionary involves building tables, which are dependent on compression parameters.
- *           For this reason, compression parameters cannot be changed anymore after loading a dictionary.
- *           It's also a CPU consuming operation, with non-negligible impact on latency.
- *  Note 3 :`dict` content will be copied internally.
- *           Use ZSTD_CCtx_loadDictionary_byReference() to reference dictionary content instead.
- *           In such a case, dictionary buffer must outlive its users.
- *  Note 4 : Use ZSTD_CCtx_loadDictionary_advanced()
- *           to precisely select how dictionary content must be interpreted. */
-ZSTDLIB_API size_t ZSTD_CCtx_loadDictionary(ZSTD_CCtx* cctx, const void* dict, size_t dictSize);
-ZSTDLIB_API size_t ZSTD_CCtx_loadDictionary_byReference(ZSTD_CCtx* cctx, const void* dict, size_t dictSize);
-ZSTDLIB_API size_t ZSTD_CCtx_loadDictionary_advanced(ZSTD_CCtx* cctx, const void* dict, size_t dictSize, ZSTD_dictLoadMethod_e dictLoadMethod, ZSTD_dictContentType_e dictContentType);
-
-
-/*! ZSTD_CCtx_refCDict() :
- *  Reference a prepared dictionary, to be used for all next compression jobs.
- *  Note that compression parameters are enforced from within CDict,
- *  and supercede any compression parameter previously set within CCtx.
- *  The dictionary will remain valid for future compression jobs using same CCtx.
- * @result : 0, or an error code (which can be tested with ZSTD_isError()).
- *  Special : adding a NULL CDict means "return to no-dictionary mode".
- *  Note 1 : Currently, only one dictionary can be managed.
- *           Adding a new dictionary effectively "discards" any previous one.
- *  Note 2 : CDict is just referenced, its lifetime must outlive CCtx. */
-ZSTDLIB_API size_t ZSTD_CCtx_refCDict(ZSTD_CCtx* cctx, const ZSTD_CDict* cdict);
-
-/*! ZSTD_CCtx_refPrefix() :
- *  Reference a prefix (single-usage dictionary) for next compression job.
- *  Decompression will need same prefix to properly regenerate data.
- *  Compressing with a prefix is similar in outcome as performing a diff and compressing it,
- *  but performs much faster, especially during decompression (compression speed is tunable with compression level).
- *  Note that prefix is **only used once**. Tables are discarded at end of compression job (ZSTD_e_end).
- * @result : 0, or an error code (which can be tested with ZSTD_isError()).
- *  Special: Adding any prefix (including NULL) invalidates any previous prefix or dictionary
- *  Note 1 : Prefix buffer is referenced. It **must** outlive compression job.
- *           Its contain must remain unmodified up to end of compression (ZSTD_e_end).
- *  Note 2 : If the intention is to diff some large src data blob with some prior version of itself,
- *           ensure that the window size is large enough to contain the entire source.
- *           See ZSTD_p_windowLog.
- *  Note 3 : Referencing a prefix involves building tables, which are dependent on compression parameters.
- *           It's a CPU consuming operation, with non-negligible impact on latency.
- *           If there is a need to use same prefix multiple times, consider loadDictionary instead.
- *  Note 4 : By default, the prefix is treated as raw content (ZSTD_dm_rawContent).
- *           Use ZSTD_CCtx_refPrefix_advanced() to alter dictMode. */
-ZSTDLIB_API size_t ZSTD_CCtx_refPrefix(ZSTD_CCtx* cctx,
-                                       const void* prefix, size_t prefixSize);
-ZSTDLIB_API size_t ZSTD_CCtx_refPrefix_advanced(ZSTD_CCtx* cctx,
-                                       const void* prefix, size_t prefixSize,
-                                       ZSTD_dictContentType_e dictContentType);
-
-/*! ZSTD_CCtx_reset() :
- *  Return a CCtx to clean state.
- *  Useful after an error, or to interrupt an ongoing compression job and start a new one.
- *  Any internal data not yet flushed is cancelled.
- *  The parameters and dictionary are kept unchanged, to reset them use ZSTD_CCtx_resetParameters().
- */
-ZSTDLIB_API void ZSTD_CCtx_reset(ZSTD_CCtx* cctx);
-
-/*! ZSTD_CCtx_resetParameters() :
- *  All parameters are back to default values (compression level is ZSTD_CLEVEL_DEFAULT).
- *  Dictionary (if any) is dropped.
- *  Resetting parameters is only possible during frame initialization (before starting compression).
- *  To reset the context use ZSTD_CCtx_reset().
- *  @return 0 or an error code (which can be checked with ZSTD_isError()).
- */
-ZSTDLIB_API size_t ZSTD_CCtx_resetParameters(ZSTD_CCtx* cctx);
-
-
-
-typedef enum {
-    ZSTD_e_continue=0, /* collect more data, encoder decides when to output compressed result, for optimal conditions */
-    ZSTD_e_flush,      /* flush any data provided so far - frame will continue, future data can still reference previous data for better compression */
-    ZSTD_e_end         /* flush any remaining data and close current frame. Any additional data starts a new frame. */
-} ZSTD_EndDirective;
-
-/*! ZSTD_compress_generic() :
- *  Behave about the same as ZSTD_compressStream. To note :
- *  - Compression parameters are pushed into CCtx before starting compression, using ZSTD_CCtx_setParameter()
- *  - Compression parameters cannot be changed once compression is started.
- *  - outpot->pos must be <= dstCapacity, input->pos must be <= srcSize
- *  - outpot->pos and input->pos will be updated. They are guaranteed to remain below their respective limit.
- *  - In single-thread mode (default), function is blocking : it completed its job before returning to caller.
- *  - In multi-thread mode, function is non-blocking : it just acquires a copy of input, and distribute job to internal worker threads,
- *                                                     and then immediately returns, just indicating that there is some data remaining to be flushed.
- *                                                     The function nonetheless guarantees forward progress : it will return only after it reads or write at least 1+ byte.
- *  - Exception : in multi-threading mode, if the first call requests a ZSTD_e_end directive, it is blocking : it will complete compression before giving back control to caller.
- *  - @return provides a minimum amount of data remaining to be flushed from internal buffers
- *            or an error code, which can be tested using ZSTD_isError().
- *            if @return != 0, flush is not fully completed, there is still some data left within internal buffers.
- *            This is useful for ZSTD_e_flush, since in this case more flushes are necessary to empty all buffers.
- *            For ZSTD_e_end, @return == 0 when internal buffers are fully flushed and frame is completed.
- *  - after a ZSTD_e_end directive, if internal buffer is not fully flushed (@return != 0),
- *            only ZSTD_e_end or ZSTD_e_flush operations are allowed.
- *            Before starting a new compression job, or changing compression parameters,
- *            it is required to fully flush internal buffers.
- */
-ZSTDLIB_API size_t ZSTD_compress_generic (ZSTD_CCtx* cctx,
-                                          ZSTD_outBuffer* output,
-                                          ZSTD_inBuffer* input,
-                                          ZSTD_EndDirective endOp);
-
-
-/*! ZSTD_compress_generic_simpleArgs() :
- *  Same as ZSTD_compress_generic(),
- *  but using only integral types as arguments.
- *  Argument list is larger than ZSTD_{in,out}Buffer,
- *  but can be helpful for binders from dynamic languages
- *  which have troubles handling structures containing memory pointers.
- */
-ZSTDLIB_API size_t ZSTD_compress_generic_simpleArgs (
-                            ZSTD_CCtx* cctx,
-                            void* dst, size_t dstCapacity, size_t* dstPos,
-                      const void* src, size_t srcSize, size_t* srcPos,
-                            ZSTD_EndDirective endOp);
-
-
-/*! ZSTD_CCtx_params :
- *  Quick howto :
- *  - ZSTD_createCCtxParams() : Create a ZSTD_CCtx_params structure
- *  - ZSTD_CCtxParam_setParameter() : Push parameters one by one into
- *                                    an existing ZSTD_CCtx_params structure.
- *                                    This is similar to
- *                                    ZSTD_CCtx_setParameter().
- *  - ZSTD_CCtx_setParametersUsingCCtxParams() : Apply parameters to
- *                                    an existing CCtx.
- *                                    These parameters will be applied to
- *                                    all subsequent compression jobs.
- *  - ZSTD_compress_generic() : Do compression using the CCtx.
- *  - ZSTD_freeCCtxParams() : Free the memory.
- *
- *  This can be used with ZSTD_estimateCCtxSize_advanced_usingCCtxParams()
- *  for static allocation for single-threaded compression.
- */
-ZSTDLIB_API ZSTD_CCtx_params* ZSTD_createCCtxParams(void);
-ZSTDLIB_API size_t ZSTD_freeCCtxParams(ZSTD_CCtx_params* params);
-
-
-/*! ZSTD_CCtxParams_reset() :
- *  Reset params to default values.
- */
-ZSTDLIB_API size_t ZSTD_CCtxParams_reset(ZSTD_CCtx_params* params);
-
-/*! ZSTD_CCtxParams_init() :
- *  Initializes the compression parameters of cctxParams according to
- *  compression level. All other parameters are reset to their default values.
- */
-ZSTDLIB_API size_t ZSTD_CCtxParams_init(ZSTD_CCtx_params* cctxParams, int compressionLevel);
-
-/*! ZSTD_CCtxParams_init_advanced() :
- *  Initializes the compression and frame parameters of cctxParams according to
- *  params. All other parameters are reset to their default values.
- */
-ZSTDLIB_API size_t ZSTD_CCtxParams_init_advanced(ZSTD_CCtx_params* cctxParams, ZSTD_parameters params);
-
-
-/*! ZSTD_CCtxParam_setParameter() :
- *  Similar to ZSTD_CCtx_setParameter.
- *  Set one compression parameter, selected by enum ZSTD_cParameter.
- *  Parameters must be applied to a ZSTD_CCtx using ZSTD_CCtx_setParametersUsingCCtxParams().
- *  Note : when `value` is an enum, cast it to unsigned for proper type checking.
- * @result : 0, or an error code (which can be tested with ZSTD_isError()).
- */
-ZSTDLIB_API size_t ZSTD_CCtxParam_setParameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, unsigned value);
-
-/*! ZSTD_CCtxParam_getParameter() :
- * Similar to ZSTD_CCtx_getParameter.
- * Get the requested value of one compression parameter, selected by enum ZSTD_cParameter.
- * @result : 0, or an error code (which can be tested with ZSTD_isError()).
- */
-ZSTDLIB_API size_t ZSTD_CCtxParam_getParameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, unsigned* value);
-
-/*! ZSTD_CCtx_setParametersUsingCCtxParams() :
- *  Apply a set of ZSTD_CCtx_params to the compression context.
- *  This can be done even after compression is started,
- *    if nbWorkers==0, this will have no impact until a new compression is started.
- *    if nbWorkers>=1, new parameters will be picked up at next job,
- *       with a few restrictions (windowLog, pledgedSrcSize, nbWorkers, jobSize, and overlapLog are not updated).
- */
-ZSTDLIB_API size_t ZSTD_CCtx_setParametersUsingCCtxParams(
-        ZSTD_CCtx* cctx, const ZSTD_CCtx_params* params);
-
-
-/* ==================================== */
-/*===   Advanced decompression API   ===*/
-/* ==================================== */
-
-/* The following API works the same way as the advanced compression API :
- * a context is created, parameters are pushed into it one by one,
- * then the context can be used to decompress data using an interface similar to the straming API.
- */
-
-/*! ZSTD_DCtx_loadDictionary() :
- *  Create an internal DDict from dict buffer,
- *  to be used to decompress next frames.
- * @result : 0, or an error code (which can be tested with ZSTD_isError()).
- *  Special : Adding a NULL (or 0-size) dictionary invalidates any previous dictionary,
- *            meaning "return to no-dictionary mode".
- *  Note 1 : `dict` content will be copied internally.
- *            Use ZSTD_DCtx_loadDictionary_byReference()
- *            to reference dictionary content instead.
- *            In which case, the dictionary buffer must outlive its users.
- *  Note 2 : Loading a dictionary involves building tables,
- *           which has a non-negligible impact on CPU usage and latency.
- *  Note 3 : Use ZSTD_DCtx_loadDictionary_advanced() to select
- *           how dictionary content will be interpreted and loaded.
- */
-ZSTDLIB_API size_t ZSTD_DCtx_loadDictionary(ZSTD_DCtx* dctx, const void* dict, size_t dictSize);
-ZSTDLIB_API size_t ZSTD_DCtx_loadDictionary_byReference(ZSTD_DCtx* dctx, const void* dict, size_t dictSize);
-ZSTDLIB_API size_t ZSTD_DCtx_loadDictionary_advanced(ZSTD_DCtx* dctx, const void* dict, size_t dictSize, ZSTD_dictLoadMethod_e dictLoadMethod, ZSTD_dictContentType_e dictContentType);
-
-
-/*! ZSTD_DCtx_refDDict() :
- *  Reference a prepared dictionary, to be used to decompress next frames.
- *  The dictionary remains active for decompression of future frames using same DCtx.
- * @result : 0, or an error code (which can be tested with ZSTD_isError()).
- *  Note 1 : Currently, only one dictionary can be managed.
- *           Referencing a new dictionary effectively "discards" any previous one.
- *  Special : adding a NULL DDict means "return to no-dictionary mode".
- *  Note 2 : DDict is just referenced, its lifetime must outlive its usage from DCtx.
- */
-ZSTDLIB_API size_t ZSTD_DCtx_refDDict(ZSTD_DCtx* dctx, const ZSTD_DDict* ddict);
-
-
-/*! ZSTD_DCtx_refPrefix() :
- *  Reference a prefix (single-usage dictionary) for next compression job.
- *  This is the reverse operation of ZSTD_CCtx_refPrefix(),
- *  and must use the same prefix as the one used during compression.
- *  Prefix is **only used once**. Reference is discarded at end of frame.
- *  End of frame is reached when ZSTD_DCtx_decompress_generic() returns 0.
- * @result : 0, or an error code (which can be tested with ZSTD_isError()).
- *  Note 1 : Adding any prefix (including NULL) invalidates any previously set prefix or dictionary
- *  Note 2 : Prefix buffer is referenced. It **must** outlive decompression job.
- *           Prefix buffer must remain unmodified up to the end of frame,
- *           reached when ZSTD_DCtx_decompress_generic() returns 0.
- *  Note 3 : By default, the prefix is treated as raw content (ZSTD_dm_rawContent).
- *           Use ZSTD_CCtx_refPrefix_advanced() to alter dictMode.
- *  Note 4 : Referencing a raw content prefix has almost no cpu nor memory cost.
- *           A fulldict prefix is more costly though.
- */
-ZSTDLIB_API size_t ZSTD_DCtx_refPrefix(ZSTD_DCtx* dctx,
-                                    const void* prefix, size_t prefixSize);
-ZSTDLIB_API size_t ZSTD_DCtx_refPrefix_advanced(ZSTD_DCtx* dctx,
-                                    const void* prefix, size_t prefixSize,
-                                    ZSTD_dictContentType_e dictContentType);
-
-
-/*! ZSTD_DCtx_setMaxWindowSize() :
- *  Refuses allocating internal buffers for frames requiring a window size larger than provided limit.
- *  This is useful to prevent a decoder context from reserving too much memory for itself (potential attack scenario).
- *  This parameter is only useful in streaming mode, since no internal buffer is allocated in direct mode.
- *  By default, a decompression context accepts all window sizes <= (1 << ZSTD_WINDOWLOG_MAX)
- * @return : 0, or an error code (which can be tested using ZSTD_isError()).
- */
-ZSTDLIB_API size_t ZSTD_DCtx_setMaxWindowSize(ZSTD_DCtx* dctx, size_t maxWindowSize);
-
-
-/*! ZSTD_DCtx_setFormat() :
- *  Instruct the decoder context about what kind of data to decode next.
- *  This instruction is mandatory to decode data without a fully-formed header,
- *  such ZSTD_f_zstd1_magicless for example.
- * @return : 0, or an error code (which can be tested using ZSTD_isError()).
- */
-ZSTDLIB_API size_t ZSTD_DCtx_setFormat(ZSTD_DCtx* dctx, ZSTD_format_e format);
-
-
-/*! ZSTD_getFrameHeader_advanced() :
- *  same as ZSTD_getFrameHeader(),
- *  with added capability to select a format (like ZSTD_f_zstd1_magicless) */
-ZSTDLIB_API size_t ZSTD_getFrameHeader_advanced(ZSTD_frameHeader* zfhPtr,
-                        const void* src, size_t srcSize, ZSTD_format_e format);
-
-
-/*! ZSTD_decompress_generic() :
- *  Behave the same as ZSTD_decompressStream.
- *  Decompression parameters cannot be changed once decompression is started.
- * @return : an error code, which can be tested using ZSTD_isError()
- *           if >0, a hint, nb of expected input bytes for next invocation.
- *           `0` means : a frame has just been fully decoded and flushed.
- */
-ZSTDLIB_API size_t ZSTD_decompress_generic(ZSTD_DCtx* dctx,
-                                           ZSTD_outBuffer* output,
-                                           ZSTD_inBuffer* input);
-
-
-/*! ZSTD_decompress_generic_simpleArgs() :
- *  Same as ZSTD_decompress_generic(),
- *  but using only integral types as arguments.
- *  Argument list is larger than ZSTD_{in,out}Buffer,
- *  but can be helpful for binders from dynamic languages
- *  which have troubles handling structures containing memory pointers.
- */
-ZSTDLIB_API size_t ZSTD_decompress_generic_simpleArgs (
-                            ZSTD_DCtx* dctx,
-                            void* dst, size_t dstCapacity, size_t* dstPos,
-                      const void* src, size_t srcSize, size_t* srcPos);
-
-
-/*! ZSTD_DCtx_reset() :
- *  Return a DCtx to clean state.
- *  If a decompression was ongoing, any internal data not yet flushed is cancelled.
- *  All parameters are back to default values, including sticky ones.
- *  Dictionary (if any) is dropped.
- *  Parameters can be modified again after a reset.
- */
-ZSTDLIB_API void ZSTD_DCtx_reset(ZSTD_DCtx* dctx);
-
-
 
 /* ============================ */
 /**       Block level API       */
@@ -1491,10 +1741,10 @@
       + copyCCtx() and copyDCtx() can be used too
     - Block size is limited, it must be <= ZSTD_getBlockSize() <= ZSTD_BLOCKSIZE_MAX == 128 KB
       + If input is larger than a block size, it's necessary to split input data into multiple blocks
-      + For inputs larger than a single block size, consider using the regular ZSTD_compress() instead.
+      + For inputs larger than a single block, really consider using regular ZSTD_compress() instead.
         Frame metadata is not that costly, and quickly becomes negligible as source size grows larger.
     - When a block is considered not compressible enough, ZSTD_compressBlock() result will be zero.
-      In which case, nothing is produced into `dst`.
+      In which case, nothing is produced into `dst` !
       + User must test for such outcome and deal directly with uncompressed data
       + ZSTD_decompressBlock() doesn't accept uncompressed data as input !!!
       + In case of multiple successive blocks, should some of them be uncompressed,
diff --git a/tests/test-check-py3-compat.t b/tests/test-check-py3-compat.t
--- a/tests/test-check-py3-compat.t
+++ b/tests/test-check-py3-compat.t
@@ -36,14 +36,6 @@
   > -X mercurial/thirdparty \
   > | sed 's|\\|/|g' | xargs "$PYTHON" contrib/check-py3-compat.py \
   > | sed 's/[0-9][0-9]*)$/*)/'
-  contrib/python-zstandard/tests/test_compressor.py:324: SyntaxWarning: invalid escape sequence \( (py38 !)
-    with self.assertRaisesRegexp(zstd.ZstdError, 'cannot call compress\(\) after compressor'): (py38 !)
-  contrib/python-zstandard/tests/test_compressor.py:1329: SyntaxWarning: invalid escape sequence \( (py38 !)
-    'cannot call compress\(\) after compression finished'): (py38 !)
-  contrib/python-zstandard/tests/test_compressor.py:1341: SyntaxWarning: invalid escape sequence \( (py38 !)
-    'cannot call flush\(\) after compression finished'): (py38 !)
-  contrib/python-zstandard/tests/test_compressor.py:1353: SyntaxWarning: invalid escape sequence \( (py38 !)
-    'cannot call finish\(\) after compression finished'): (py38 !)
   hgext/convert/transport.py: error importing: <*Error> No module named 'svn.client' (error at transport.py:*) (glob) (?)
   hgext/infinitepush/sqlindexapi.py: error importing: <*Error> No module named 'mysql' (error at sqlindexapi.py:*) (glob) (?)
   mercurial/scmwindows.py: error importing: <ValueError> _type_ 'v' not supported (error at win32.py:*) (no-windows !)
diff --git a/tests/test-http-api-httpv2.t b/tests/test-http-api-httpv2.t
--- a/tests/test-http-api-httpv2.t
+++ b/tests/test-http-api-httpv2.t
@@ -729,7 +729,7 @@
   s>     \r\n
   s>     25\r\n
   s>     \x1d\x00\x00\x01\x00\x02\x042
-  s>     (\xb5/\xfd\x00P\xa4\x00\x00p\xa1FstatusBok\x81T\x00\x01\x00\tP\x02
+  s>     (\xb5/\xfd\x00X\xa4\x00\x00p\xa1FstatusBok\x81T\x00\x01\x00\tP\x02
   s>     \r\n
   s>     0\r\n
   s>     \r\n
diff --git a/tests/test-http-protocol.t b/tests/test-http-protocol.t
--- a/tests/test-http-protocol.t
+++ b/tests/test-http-protocol.t
@@ -96,7 +96,7 @@
 
   $ get-with-headers.py --hgproto '0.2 comp=zstd' $LOCALIP:$HGPORT '?cmd=getbundle&heads=e93700bd72895c5addab234c56d4024b487a362f&common=0000000000000000000000000000000000000000' > resp
   $ f --size --hexdump --bytes 36 --sha1 resp
-  resp: size=248, sha1=4d8d8f87fb82bd542ce52881fdc94f850748
+  resp: size=248, sha1=f11b5c098c638068b3d5fe2f9e6241bf5228
   0000: 32 30 30 20 53 63 72 69 70 74 20 6f 75 74 70 75 |200 Script outpu|
   0010: 74 20 66 6f 6c 6c 6f 77 73 0a 0a 04 7a 73 74 64 |t follows...zstd|
   0020: 28 b5 2f fd                                     |(./.|