Coverage for /var/srv/projects/api.amasfac.comuna18.com/tmp/venv/lib/python3.9/site-packages/numpy/lib/format.py: 8%

1"""

2Binary serialization

4NPY format

5==========

7A simple format for saving numpy arrays to disk with the full

8information about them.

10The ``.npy`` format is the standard binary file format in NumPy for

11persisting a *single* arbitrary NumPy array on disk. The format stores all

12of the shape and dtype information necessary to reconstruct the array

13correctly even on another machine with a different architecture.

14The format is designed to be as simple as possible while achieving

15its limited goals.

17The ``.npz`` format is the standard format for persisting *multiple* NumPy

18arrays on disk. A ``.npz`` file is a zip file containing multiple ``.npy``

19files, one for each array.

21Capabilities

22------------

24- Can represent all NumPy arrays including nested record arrays and

25 object arrays.

27- Represents the data in its native binary form.

29- Supports Fortran-contiguous arrays directly.

31- Stores all of the necessary information to reconstruct the array

32 including shape and dtype on a machine of a different

33 architecture. Both little-endian and big-endian arrays are

34 supported, and a file with little-endian numbers will yield

35 a little-endian array on any machine reading the file. The

36 types are described in terms of their actual sizes. For example,

37 if a machine with a 64-bit C "long int" writes out an array with

38 "long ints", a reading machine with 32-bit C "long ints" will yield

39 an array with 64-bit integers.

41- Is straightforward to reverse engineer. Datasets often live longer than

42 the programs that created them. A competent developer should be

43 able to create a solution in their preferred programming language to

44 read most ``.npy`` files that they have been given without much

45 documentation.

47- Allows memory-mapping of the data. See `open_memmap`.

49- Can be read from a filelike stream object instead of an actual file.

51- Stores object arrays, i.e. arrays containing elements that are arbitrary

52 Python objects. Files with object arrays are not to be mmapable, but

53 can be read and written to disk.

55Limitations

56-----------

58- Arbitrary subclasses of numpy.ndarray are not completely preserved.

59 Subclasses will be accepted for writing, but only the array data will

60 be written out. A regular numpy.ndarray object will be created

61 upon reading the file.

63.. warning::

65 Due to limitations in the interpretation of structured dtypes, dtypes

66 with fields with empty names will have the names replaced by 'f0', 'f1',

67 etc. Such arrays will not round-trip through the format entirely

68 accurately. The data is intact; only the field names will differ. We are

69 working on a fix for this. This fix will not require a change in the

70 file format. The arrays with such structures can still be saved and

71 restored, and the correct dtype may be restored by using the

72 ``loadedarray.view(correct_dtype)`` method.

74File extensions

75---------------

77We recommend using the ``.npy`` and ``.npz`` extensions for files saved

78in this format. This is by no means a requirement; applications may wish

79to use these file formats but use an extension specific to the

80application. In the absence of an obvious alternative, however,

81we suggest using ``.npy`` and ``.npz``.

83Version numbering

84-----------------

86The version numbering of these formats is independent of NumPy version

87numbering. If the format is upgraded, the code in `numpy.io` will still

88be able to read and write Version 1.0 files.

90Format Version 1.0

91------------------

93The first 6 bytes are a magic string: exactly ``\\x93NUMPY``.

95The next 1 byte is an unsigned byte: the major version number of the file

96format, e.g. ``\\x01``.

98The next 1 byte is an unsigned byte: the minor version number of the file

99format, e.g. ``\\x00``. Note: the version of the file format is not tied

100to the version of the numpy package.

101

102The next 2 bytes form a little-endian unsigned short int: the length of

103the header data HEADER_LEN.

104

105The next HEADER_LEN bytes form the header data describing the array's

106format. It is an ASCII string which contains a Python literal expression

107of a dictionary. It is terminated by a newline (``\\n``) and padded with

108spaces (``\\x20``) to make the total of

109``len(magic string) + 2 + len(length) + HEADER_LEN`` be evenly divisible

110by 64 for alignment purposes.

111

112The dictionary contains three keys:

113

114 "descr" : dtype.descr

115 An object that can be passed as an argument to the `numpy.dtype`

116 constructor to create the array's dtype.

117 "fortran_order" : bool

118 Whether the array data is Fortran-contiguous or not. Since

119 Fortran-contiguous arrays are a common form of non-C-contiguity,

120 we allow them to be written directly to disk for efficiency.

121 "shape" : tuple of int

122 The shape of the array.

123

124For repeatability and readability, the dictionary keys are sorted in

125alphabetic order. This is for convenience only. A writer SHOULD implement

126this if possible. A reader MUST NOT depend on this.

127

128Following the header comes the array data. If the dtype contains Python

129objects (i.e. ``dtype.hasobject is True``), then the data is a Python

130pickle of the array. Otherwise the data is the contiguous (either C-

131or Fortran-, depending on ``fortran_order``) bytes of the array.

132Consumers can figure out the number of bytes by multiplying the number

133of elements given by the shape (noting that ``shape=()`` means there is

1341 element) by ``dtype.itemsize``.

135

136Format Version 2.0

137------------------

138

139The version 1.0 format only allowed the array header to have a total size of

14065535 bytes. This can be exceeded by structured arrays with a large number of

141columns. The version 2.0 format extends the header size to 4 GiB.

142`numpy.save` will automatically save in 2.0 format if the data requires it,

143else it will always use the more compatible 1.0 format.

144

145The description of the fourth element of the header therefore has become:

146"The next 4 bytes form a little-endian unsigned int: the length of the header

147data HEADER_LEN."

148

149Format Version 3.0

150------------------

151

152This version replaces the ASCII string (which in practice was latin1) with

153a utf8-encoded string, so supports structured types with any unicode field

154names.

155

156Notes

157-----

158The ``.npy`` format, including motivation for creating it and a comparison of

159alternatives, is described in the

160:doc:`"npy-format" NEP <neps:nep-0001-npy-format>`, however details have

161evolved with time and this document is more current.

162

163"""

164import numpy

165import warnings

166from numpy.lib.utils import safe_eval

167from numpy.compat import (

168 isfileobj, os_fspath, pickle

169 )

172__all__ = []

175EXPECTED_KEYS = {'descr', 'fortran_order', 'shape'}

176MAGIC_PREFIX = b'\x93NUMPY'

177MAGIC_LEN = len(MAGIC_PREFIX) + 2

178ARRAY_ALIGN = 64 # plausible values are powers of 2 between 16 and 4096

179BUFFER_SIZE = 2**18 # size of buffer for reading npz files in bytes

180

181# difference between version 1.0 and 2.0 is a 4 byte (I) header length

182# instead of 2 bytes (H) allowing storage of large structured arrays

183_header_size_info = {

184 (1, 0): ('<H', 'latin1'),

185 (2, 0): ('<I', 'latin1'),

186 (3, 0): ('<I', 'utf8'),

187}

188

189# Python's literal_eval is not actually safe for large inputs, since parsing

190# may become slow or even cause interpreter crashes.

191# This is an arbitrary, low limit which should make it safe in practice.

192_MAX_HEADER_SIZE = 10000

193

194def _check_version(version):

195 if version not in [(1, 0), (2, 0), (3, 0), None]:

196 msg = "we only support format version (1,0), (2,0), and (3,0), not %s"

197 raise ValueError(msg % (version,))

198

199def magic(major, minor):

200 """ Return the magic string for the given file format version.

201

202 Parameters

203 ----------

204 major : int in [0, 255]

205 minor : int in [0, 255]

206

207 Returns

208 -------

209 magic : str

210

211 Raises

212 ------

213 ValueError if the version cannot be formatted.

214 """

215 if major < 0 or major > 255:

216 raise ValueError("major version must be 0 <= major < 256")

217 if minor < 0 or minor > 255:

218 raise ValueError("minor version must be 0 <= minor < 256")

219 return MAGIC_PREFIX + bytes([major, minor])

220

221def read_magic(fp):

222 """ Read the magic string to get the version of the file format.

223

224 Parameters

225 ----------

226 fp : filelike object

227

228 Returns

229 -------

230 major : int

231 minor : int

232 """

233 magic_str = _read_bytes(fp, MAGIC_LEN, "magic string")

234 if magic_str[:-2] != MAGIC_PREFIX:

235 msg = "the magic string is not correct; expected %r, got %r"

236 raise ValueError(msg % (MAGIC_PREFIX, magic_str[:-2]))

237 major, minor = magic_str[-2:]

238 return major, minor

239

240def _has_metadata(dt):

241 if dt.metadata is not None:

242 return True

243 elif dt.names is not None:

244 return any(_has_metadata(dt[k]) for k in dt.names)

245 elif dt.subdtype is not None:

246 return _has_metadata(dt.base)

247 else:

248 return False

249

250def dtype_to_descr(dtype):

251 """

252 Get a serializable descriptor from the dtype.

253

254 The .descr attribute of a dtype object cannot be round-tripped through

255 the dtype() constructor. Simple types, like dtype('float32'), have

256 a descr which looks like a record array with one field with '' as

257 a name. The dtype() constructor interprets this as a request to give

258 a default name. Instead, we construct descriptor that can be passed to

259 dtype().

260

261 Parameters

262 ----------

263 dtype : dtype

264 The dtype of the array that will be written to disk.

265

266 Returns

267 -------

268 descr : object

269 An object that can be passed to `numpy.dtype()` in order to

270 replicate the input dtype.

271

272 """

273 if _has_metadata(dtype):

274 warnings.warn("metadata on a dtype may be saved or ignored, but will "

275 "raise if saved when read. Use another form of storage.",

276 UserWarning, stacklevel=2)

277 if dtype.names is not None:

278 # This is a record array. The .descr is fine. XXX: parts of the

279 # record array with an empty name, like padding bytes, still get

280 # fiddled with. This needs to be fixed in the C implementation of

281 # dtype().

282 return dtype.descr

283 else:

284 return dtype.str

285

286def descr_to_dtype(descr):

287 """

288 Returns a dtype based off the given description.

289

290 This is essentially the reverse of `dtype_to_descr()`. It will remove

291 the valueless padding fields created by, i.e. simple fields like

292 dtype('float32'), and then convert the description to its corresponding

293 dtype.

294

295 Parameters

296 ----------

297 descr : object

298 The object retrieved by dtype.descr. Can be passed to

299 `numpy.dtype()` in order to replicate the input dtype.

300

301 Returns

302 -------

303 dtype : dtype

304 The dtype constructed by the description.

305

306 """

307 if isinstance(descr, str):

308 # No padding removal needed

309 return numpy.dtype(descr)

310 elif isinstance(descr, tuple):

311 # subtype, will always have a shape descr[1]

312 dt = descr_to_dtype(descr[0])

313 return numpy.dtype((dt, descr[1]))

314

315 titles = []

316 names = []

317 formats = []

318 offsets = []

319 offset = 0

320 for field in descr:

321 if len(field) == 2:

322 name, descr_str = field

323 dt = descr_to_dtype(descr_str)

324 else:

325 name, descr_str, shape = field

326 dt = numpy.dtype((descr_to_dtype(descr_str), shape))

327

328 # Ignore padding bytes, which will be void bytes with '' as name

329 # Once support for blank names is removed, only "if name == ''" needed)

330 is_pad = (name == '' and dt.type is numpy.void and dt.names is None)

331 if not is_pad:

332 title, name = name if isinstance(name, tuple) else (None, name)

333 titles.append(title)

334 names.append(name)

335 formats.append(dt)

336 offsets.append(offset)

337 offset += dt.itemsize

338

339 return numpy.dtype({'names': names, 'formats': formats, 'titles': titles,

340 'offsets': offsets, 'itemsize': offset})

341

342def header_data_from_array_1_0(array):

343 """ Get the dictionary of header metadata from a numpy.ndarray.

344

345 Parameters

346 ----------

347 array : numpy.ndarray

348

349 Returns

350 -------

351 d : dict

352 This has the appropriate entries for writing its string representation

353 to the header of the file.

354 """

355 d = {'shape': array.shape}

356 if array.flags.c_contiguous:

357 d['fortran_order'] = False

358 elif array.flags.f_contiguous:

359 d['fortran_order'] = True

360 else:

361 # Totally non-contiguous data. We will have to make it C-contiguous

362 # before writing. Note that we need to test for C_CONTIGUOUS first

363 # because a 1-D array is both C_CONTIGUOUS and F_CONTIGUOUS.

364 d['fortran_order'] = False

365

366 d['descr'] = dtype_to_descr(array.dtype)

367 return d

368

369

370def _wrap_header(header, version):

371 """

372 Takes a stringified header, and attaches the prefix and padding to it

373 """

374 import struct

375 assert version is not None

376 fmt, encoding = _header_size_info[version]

377 header = header.encode(encoding)

378 hlen = len(header) + 1

379 padlen = ARRAY_ALIGN - ((MAGIC_LEN + struct.calcsize(fmt) + hlen) % ARRAY_ALIGN)

380 try:

381 header_prefix = magic(*version) + struct.pack(fmt, hlen + padlen)

382 except struct.error:

383 msg = "Header length {} too big for version={}".format(hlen, version)

384 raise ValueError(msg) from None

385

386 # Pad the header with spaces and a final newline such that the magic

387 # string, the header-length short and the header are aligned on a

388 # ARRAY_ALIGN byte boundary. This supports memory mapping of dtypes

389 # aligned up to ARRAY_ALIGN on systems like Linux where mmap()

390 # offset must be page-aligned (i.e. the beginning of the file).

391 return header_prefix + header + b' '*padlen + b'\n'

392

393

394def _wrap_header_guess_version(header):

395 """

396 Like `_wrap_header`, but chooses an appropriate version given the contents

397 """

398 try:

399 return _wrap_header(header, (1, 0))

400 except ValueError:

401 pass

402

403 try:

404 ret = _wrap_header(header, (2, 0))

405 except UnicodeEncodeError:

406 pass

407 else:

408 warnings.warn("Stored array in format 2.0. It can only be"

409 "read by NumPy >= 1.9", UserWarning, stacklevel=2)

410 return ret

411

412 header = _wrap_header(header, (3, 0))

413 warnings.warn("Stored array in format 3.0. It can only be "

414 "read by NumPy >= 1.17", UserWarning, stacklevel=2)

415 return header

416

417

418def _write_array_header(fp, d, version=None):

419 """ Write the header for an array and returns the version used

420

421 Parameters

422 ----------

423 fp : filelike object

424 d : dict

425 This has the appropriate entries for writing its string representation

426 to the header of the file.

427 version : tuple or None

428 None means use oldest that works. Providing an explicit version will

429 raise a ValueError if the format does not allow saving this data.

430 Default: None

431 """

432 header = ["{"]

433 for key, value in sorted(d.items()):

434 # Need to use repr here, since we eval these when reading

435 header.append("'%s': %s, " % (key, repr(value)))

436 header.append("}")

437 header = "".join(header)

438 if version is None:

439 header = _wrap_header_guess_version(header)

440 else:

441 header = _wrap_header(header, version)

442 fp.write(header)

443

444def write_array_header_1_0(fp, d):

445 """ Write the header for an array using the 1.0 format.

446

447 Parameters

448 ----------

449 fp : filelike object

450 d : dict

451 This has the appropriate entries for writing its string

452 representation to the header of the file.

453 """

454 _write_array_header(fp, d, (1, 0))

455

456

457def write_array_header_2_0(fp, d):

458 """ Write the header for an array using the 2.0 format.

459 The 2.0 format allows storing very large structured arrays.

460

461 .. versionadded:: 1.9.0

462

463 Parameters

464 ----------

465 fp : filelike object

466 d : dict

467 This has the appropriate entries for writing its string

468 representation to the header of the file.

469 """

470 _write_array_header(fp, d, (2, 0))

471

472def read_array_header_1_0(fp, max_header_size=_MAX_HEADER_SIZE):

473 """

474 Read an array header from a filelike object using the 1.0 file format

475 version.

476

477 This will leave the file object located just after the header.

478

479 Parameters

480 ----------

481 fp : filelike object

482 A file object or something with a `.read()` method like a file.

483

484 Returns

485 -------

486 shape : tuple of int

487 The shape of the array.

488 fortran_order : bool

489 The array data will be written out directly if it is either

490 C-contiguous or Fortran-contiguous. Otherwise, it will be made

491 contiguous before writing it out.

492 dtype : dtype

493 The dtype of the file's data.

494 max_header_size : int, optional

495 Maximum allowed size of the header. Large headers may not be safe

496 to load securely and thus require explicitly passing a larger value.

497 See :py:meth:`ast.literal_eval()` for details.

498

499 Raises

500 ------

501 ValueError

502 If the data is invalid.

503

504 """

505 return _read_array_header(

506 fp, version=(1, 0), max_header_size=max_header_size)

507

508def read_array_header_2_0(fp, max_header_size=_MAX_HEADER_SIZE):

509 """

510 Read an array header from a filelike object using the 2.0 file format

511 version.

512

513 This will leave the file object located just after the header.

514

515 .. versionadded:: 1.9.0

516

517 Parameters

518 ----------

519 fp : filelike object

520 A file object or something with a `.read()` method like a file.

521 max_header_size : int, optional

522 Maximum allowed size of the header. Large headers may not be safe

523 to load securely and thus require explicitly passing a larger value.

524 See :py:meth:`ast.literal_eval()` for details.

525

526 Returns

527 -------

528 shape : tuple of int

529 The shape of the array.

530 fortran_order : bool

531 The array data will be written out directly if it is either

532 C-contiguous or Fortran-contiguous. Otherwise, it will be made

533 contiguous before writing it out.

534 dtype : dtype

535 The dtype of the file's data.

536

537 Raises

538 ------

539 ValueError

540 If the data is invalid.

541

542 """

543 return _read_array_header(

544 fp, version=(2, 0), max_header_size=max_header_size)

545

546

547def _filter_header(s):

548 """Clean up 'L' in npz header ints.

549

550 Cleans up the 'L' in strings representing integers. Needed to allow npz

551 headers produced in Python2 to be read in Python3.

552

553 Parameters

554 ----------

555 s : string

556 Npy file header.

557

558 Returns

559 -------

560 header : str

561 Cleaned up header.

562

563 """

564 import tokenize

565 from io import StringIO

566

567 tokens = []

568 last_token_was_number = False

569 for token in tokenize.generate_tokens(StringIO(s).readline):

570 token_type = token[0]

571 token_string = token[1]

572 if (last_token_was_number and

573 token_type == tokenize.NAME and

574 token_string == "L"):

575 continue

576 else:

577 tokens.append(token)

578 last_token_was_number = (token_type == tokenize.NUMBER)

579 return tokenize.untokenize(tokens)

580

581

582def _read_array_header(fp, version, max_header_size=_MAX_HEADER_SIZE):

583 """

584 see read_array_header_1_0

585 """

586 # Read an unsigned, little-endian short int which has the length of the

587 # header.

588 import struct

589 hinfo = _header_size_info.get(version)

590 if hinfo is None:

591 raise ValueError("Invalid version {!r}".format(version))

592 hlength_type, encoding = hinfo

593

594 hlength_str = _read_bytes(fp, struct.calcsize(hlength_type), "array header length")

595 header_length = struct.unpack(hlength_type, hlength_str)[0]

596 header = _read_bytes(fp, header_length, "array header")

597 header = header.decode(encoding)

598 if len(header) > max_header_size:

599 raise ValueError(

600 f"Header info length ({len(header)}) is large and may not be safe "

601 "to load securely.\n"

602 "To allow loading, adjust `max_header_size` or fully trust "

603 "the `.npy` file using `allow_pickle=True`.\n"

604 "For safety against large resource use or crashes, sandboxing "

605 "may be necessary.")

606

607 # The header is a pretty-printed string representation of a literal

608 # Python dictionary with trailing newlines padded to a ARRAY_ALIGN byte

609 # boundary. The keys are strings.

610 # "shape" : tuple of int

611 # "fortran_order" : bool

612 # "descr" : dtype.descr

613 # Versions (2, 0) and (1, 0) could have been created by a Python 2

614 # implementation before header filtering was implemented.

615 if version <= (2, 0):

616 header = _filter_header(header)

617 try:

618 d = safe_eval(header)

619 except SyntaxError as e:

620 msg = "Cannot parse header: {!r}"

621 raise ValueError(msg.format(header)) from e

622 if not isinstance(d, dict):

623 msg = "Header is not a dictionary: {!r}"

624 raise ValueError(msg.format(d))

625

626 if EXPECTED_KEYS != d.keys():

627 keys = sorted(d.keys())

628 msg = "Header does not contain the correct keys: {!r}"

629 raise ValueError(msg.format(keys))

630

631 # Sanity-check the values.

632 if (not isinstance(d['shape'], tuple) or

633 not all(isinstance(x, int) for x in d['shape'])):

634 msg = "shape is not valid: {!r}"

635 raise ValueError(msg.format(d['shape']))

636 if not isinstance(d['fortran_order'], bool):

637 msg = "fortran_order is not a valid bool: {!r}"

638 raise ValueError(msg.format(d['fortran_order']))

639 try:

640 dtype = descr_to_dtype(d['descr'])

641 except TypeError as e:

642 msg = "descr is not a valid dtype descriptor: {!r}"

643 raise ValueError(msg.format(d['descr'])) from e

644

645 return d['shape'], d['fortran_order'], dtype

646

647def write_array(fp, array, version=None, allow_pickle=True, pickle_kwargs=None):

648 """

649 Write an array to an NPY file, including a header.

650

651 If the array is neither C-contiguous nor Fortran-contiguous AND the

652 file_like object is not a real file object, this function will have to

653 copy data in memory.

654

655 Parameters

656 ----------

657 fp : file_like object

658 An open, writable file object, or similar object with a

659 ``.write()`` method.

660 array : ndarray

661 The array to write to disk.

662 version : (int, int) or None, optional

663 The version number of the format. None means use the oldest

664 supported version that is able to store the data. Default: None

665 allow_pickle : bool, optional

666 Whether to allow writing pickled data. Default: True

667 pickle_kwargs : dict, optional

668 Additional keyword arguments to pass to pickle.dump, excluding

669 'protocol'. These are only useful when pickling objects in object

670 arrays on Python 3 to Python 2 compatible format.

671

672 Raises

673 ------

674 ValueError

675 If the array cannot be persisted. This includes the case of

676 allow_pickle=False and array being an object array.

677 Various other errors

678 If the array contains Python objects as part of its dtype, the

679 process of pickling them may raise various errors if the objects

680 are not picklable.

681

682 """

683 _check_version(version)

684 _write_array_header(fp, header_data_from_array_1_0(array), version)

685

686 if array.itemsize == 0:

687 buffersize = 0

688 else:

689 # Set buffer size to 16 MiB to hide the Python loop overhead.

690 buffersize = max(16 * 1024 ** 2 // array.itemsize, 1)

691

692 if array.dtype.hasobject:

693 # We contain Python objects so we cannot write out the data

694 # directly. Instead, we will pickle it out

695 if not allow_pickle:

696 raise ValueError("Object arrays cannot be saved when "

697 "allow_pickle=False")

698 if pickle_kwargs is None:

699 pickle_kwargs = {}

700 pickle.dump(array, fp, protocol=3, **pickle_kwargs)

701 elif array.flags.f_contiguous and not array.flags.c_contiguous:

702 if isfileobj(fp):

703 array.T.tofile(fp)

704 else:

705 for chunk in numpy.nditer(

706 array, flags=['external_loop', 'buffered', 'zerosize_ok'],

707 buffersize=buffersize, order='F'):

708 fp.write(chunk.tobytes('C'))

709 else:

710 if isfileobj(fp):

711 array.tofile(fp)

712 else:

713 for chunk in numpy.nditer(

714 array, flags=['external_loop', 'buffered', 'zerosize_ok'],

715 buffersize=buffersize, order='C'):

716 fp.write(chunk.tobytes('C'))

717

718

719def read_array(fp, allow_pickle=False, pickle_kwargs=None, *,

720 max_header_size=_MAX_HEADER_SIZE):

721 """

722 Read an array from an NPY file.

723

724 Parameters

725 ----------

726 fp : file_like object

727 If this is not a real file object, then this may take extra memory

728 and time.

729 allow_pickle : bool, optional

730 Whether to allow writing pickled data. Default: False

731

732 .. versionchanged:: 1.16.3

733 Made default False in response to CVE-2019-6446.

734

735 pickle_kwargs : dict

736 Additional keyword arguments to pass to pickle.load. These are only

737 useful when loading object arrays saved on Python 2 when using

738 Python 3.

739 max_header_size : int, optional

740 Maximum allowed size of the header. Large headers may not be safe

741 to load securely and thus require explicitly passing a larger value.

742 See :py:meth:`ast.literal_eval()` for details.

743 This option is ignored when `allow_pickle` is passed. In that case

744 the file is by definition trusted and the limit is unnecessary.

745

746 Returns

747 -------

748 array : ndarray

749 The array from the data on disk.

750

751 Raises

752 ------

753 ValueError

754 If the data is invalid, or allow_pickle=False and the file contains

755 an object array.

756

757 """

758 if allow_pickle:

759 # Effectively ignore max_header_size, since `allow_pickle` indicates

760 # that the input is fully trusted.

761 max_header_size = 2**64

762

763 version = read_magic(fp)

764 _check_version(version)

765 shape, fortran_order, dtype = _read_array_header(

766 fp, version, max_header_size=max_header_size)

767 if len(shape) == 0:

768 count = 1

769 else:

770 count = numpy.multiply.reduce(shape, dtype=numpy.int64)

771

772 # Now read the actual data.

773 if dtype.hasobject:

774 # The array contained Python objects. We need to unpickle the data.

775 if not allow_pickle:

776 raise ValueError("Object arrays cannot be loaded when "

777 "allow_pickle=False")

778 if pickle_kwargs is None:

779 pickle_kwargs = {}

780 try:

781 array = pickle.load(fp, **pickle_kwargs)

782 except UnicodeError as err:

783 # Friendlier error message

784 raise UnicodeError("Unpickling a python object failed: %r\n"

785 "You may need to pass the encoding= option "

786 "to numpy.load" % (err,)) from err

787 else:

788 if isfileobj(fp):

789 # We can use the fast fromfile() function.

790 array = numpy.fromfile(fp, dtype=dtype, count=count)

791 else:

792 # This is not a real file. We have to read it the

793 # memory-intensive way.

794 # crc32 module fails on reads greater than 2 ** 32 bytes,

795 # breaking large reads from gzip streams. Chunk reads to

796 # BUFFER_SIZE bytes to avoid issue and reduce memory overhead

797 # of the read. In non-chunked case count < max_read_count, so

798 # only one read is performed.

799

800 # Use np.ndarray instead of np.empty since the latter does

801 # not correctly instantiate zero-width string dtypes; see

802 # https://github.com/numpy/numpy/pull/6430

803 array = numpy.ndarray(count, dtype=dtype)

804

805 if dtype.itemsize > 0:

806 # If dtype.itemsize == 0 then there's nothing more to read

807 max_read_count = BUFFER_SIZE // min(BUFFER_SIZE, dtype.itemsize)

808

809 for i in range(0, count, max_read_count):

810 read_count = min(max_read_count, count - i)

811 read_size = int(read_count * dtype.itemsize)

812 data = _read_bytes(fp, read_size, "array data")

813 array[i:i+read_count] = numpy.frombuffer(data, dtype=dtype,

814 count=read_count)

815

816 if fortran_order:

817 array.shape = shape[::-1]

818 array = array.transpose()

819 else:

820 array.shape = shape

821

822 return array

823

824

825def open_memmap(filename, mode='r+', dtype=None, shape=None,

826 fortran_order=False, version=None, *,

827 max_header_size=_MAX_HEADER_SIZE):

828 """

829 Open a .npy file as a memory-mapped array.

830

831 This may be used to read an existing file or create a new one.

832

833 Parameters

834 ----------

835 filename : str or path-like

836 The name of the file on disk. This may *not* be a file-like

837 object.

838 mode : str, optional

839 The mode in which to open the file; the default is 'r+'. In

840 addition to the standard file modes, 'c' is also accepted to mean

841 "copy on write." See `memmap` for the available mode strings.

842 dtype : data-type, optional

843 The data type of the array if we are creating a new file in "write"

844 mode, if not, `dtype` is ignored. The default value is None, which

845 results in a data-type of `float64`.

846 shape : tuple of int

847 The shape of the array if we are creating a new file in "write"

848 mode, in which case this parameter is required. Otherwise, this

849 parameter is ignored and is thus optional.

850 fortran_order : bool, optional

851 Whether the array should be Fortran-contiguous (True) or

852 C-contiguous (False, the default) if we are creating a new file in

853 "write" mode.

854 version : tuple of int (major, minor) or None

855 If the mode is a "write" mode, then this is the version of the file

856 format used to create the file. None means use the oldest

857 supported version that is able to store the data. Default: None

858 max_header_size : int, optional

859 Maximum allowed size of the header. Large headers may not be safe

860 to load securely and thus require explicitly passing a larger value.

861 See :py:meth:`ast.literal_eval()` for details.

862

863 Returns

864 -------

865 marray : memmap

866 The memory-mapped array.

867

868 Raises

869 ------

870 ValueError

871 If the data or the mode is invalid.

872 OSError

873 If the file is not found or cannot be opened correctly.

874

875 See Also

876 --------

877 numpy.memmap

878

879 """

880 if isfileobj(filename):

881 raise ValueError("Filename must be a string or a path-like object."

882 " Memmap cannot use existing file handles.")

883

884 if 'w' in mode:

885 # We are creating the file, not reading it.

886 # Check if we ought to create the file.

887 _check_version(version)

888 # Ensure that the given dtype is an authentic dtype object rather

889 # than just something that can be interpreted as a dtype object.

890 dtype = numpy.dtype(dtype)

891 if dtype.hasobject:

892 msg = "Array can't be memory-mapped: Python objects in dtype."

893 raise ValueError(msg)

894 d = dict(

895 descr=dtype_to_descr(dtype),

896 fortran_order=fortran_order,

897 shape=shape,

898 )

899 # If we got here, then it should be safe to create the file.

900 with open(os_fspath(filename), mode+'b') as fp:

901 _write_array_header(fp, d, version)

902 offset = fp.tell()

903 else:

904 # Read the header of the file first.

905 with open(os_fspath(filename), 'rb') as fp:

906 version = read_magic(fp)

907 _check_version(version)

908

909 shape, fortran_order, dtype = _read_array_header(

910 fp, version, max_header_size=max_header_size)

911 if dtype.hasobject:

912 msg = "Array can't be memory-mapped: Python objects in dtype."

913 raise ValueError(msg)

914 offset = fp.tell()

915

916 if fortran_order:

917 order = 'F'

918 else:

919 order = 'C'

920

921 # We need to change a write-only mode to a read-write mode since we've

922 # already written data to the file.

923 if mode == 'w+':

924 mode = 'r+'

925

926 marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order,

927 mode=mode, offset=offset)

928

929 return marray

930

931

932def _read_bytes(fp, size, error_template="ran out of data"):

933 """

934 Read from file-like object until size bytes are read.

935 Raises ValueError if not EOF is encountered before size bytes are read.

936 Non-blocking objects only supported if they derive from io objects.

937

938 Required as e.g. ZipExtFile in python 2.6 can return less data than

939 requested.

940 """

941 data = bytes()

942 while True:

943 # io files (default in python3) return None or raise on

944 # would-block, python2 file will truncate, probably nothing can be

945 # done about that. note that regular files can't be non-blocking

946 try:

947 r = fp.read(size - len(data))

948 data += r

949 if len(r) == 0 or len(data) == size:

950 break

951 except BlockingIOError:

952 pass

953 if len(data) != size:

954 msg = "EOF: reading %s, expected %d bytes got %d"

955 raise ValueError(msg % (error_template, size, len(data)))

956 else:

957 return data