Coverage for /var/srv/projects/api.amasfac.comuna18.com/tmp/venv/lib/python3.9/site-packages/numpy/lib/format.py: 8%

269 statements  

« prev     ^ index     » next       coverage.py v6.4.4, created at 2023-07-17 14:22 -0600

1""" 

2Binary serialization 

3 

4NPY format 

5========== 

6 

7A simple format for saving numpy arrays to disk with the full 

8information about them. 

9 

10The ``.npy`` format is the standard binary file format in NumPy for 

11persisting a *single* arbitrary NumPy array on disk. The format stores all 

12of the shape and dtype information necessary to reconstruct the array 

13correctly even on another machine with a different architecture. 

14The format is designed to be as simple as possible while achieving 

15its limited goals. 

16 

17The ``.npz`` format is the standard format for persisting *multiple* NumPy 

18arrays on disk. A ``.npz`` file is a zip file containing multiple ``.npy`` 

19files, one for each array. 

20 

21Capabilities 

22------------ 

23 

24- Can represent all NumPy arrays including nested record arrays and 

25 object arrays. 

26 

27- Represents the data in its native binary form. 

28 

29- Supports Fortran-contiguous arrays directly. 

30 

31- Stores all of the necessary information to reconstruct the array 

32 including shape and dtype on a machine of a different 

33 architecture. Both little-endian and big-endian arrays are 

34 supported, and a file with little-endian numbers will yield 

35 a little-endian array on any machine reading the file. The 

36 types are described in terms of their actual sizes. For example, 

37 if a machine with a 64-bit C "long int" writes out an array with 

38 "long ints", a reading machine with 32-bit C "long ints" will yield 

39 an array with 64-bit integers. 

40 

41- Is straightforward to reverse engineer. Datasets often live longer than 

42 the programs that created them. A competent developer should be 

43 able to create a solution in their preferred programming language to 

44 read most ``.npy`` files that they have been given without much 

45 documentation. 

46 

47- Allows memory-mapping of the data. See `open_memmap`. 

48 

49- Can be read from a filelike stream object instead of an actual file. 

50 

51- Stores object arrays, i.e. arrays containing elements that are arbitrary 

52 Python objects. Files with object arrays are not to be mmapable, but 

53 can be read and written to disk. 

54 

55Limitations 

56----------- 

57 

58- Arbitrary subclasses of numpy.ndarray are not completely preserved. 

59 Subclasses will be accepted for writing, but only the array data will 

60 be written out. A regular numpy.ndarray object will be created 

61 upon reading the file. 

62 

63.. warning:: 

64 

65 Due to limitations in the interpretation of structured dtypes, dtypes 

66 with fields with empty names will have the names replaced by 'f0', 'f1', 

67 etc. Such arrays will not round-trip through the format entirely 

68 accurately. The data is intact; only the field names will differ. We are 

69 working on a fix for this. This fix will not require a change in the 

70 file format. The arrays with such structures can still be saved and 

71 restored, and the correct dtype may be restored by using the 

72 ``loadedarray.view(correct_dtype)`` method. 

73 

74File extensions 

75--------------- 

76 

77We recommend using the ``.npy`` and ``.npz`` extensions for files saved 

78in this format. This is by no means a requirement; applications may wish 

79to use these file formats but use an extension specific to the 

80application. In the absence of an obvious alternative, however, 

81we suggest using ``.npy`` and ``.npz``. 

82 

83Version numbering 

84----------------- 

85 

86The version numbering of these formats is independent of NumPy version 

87numbering. If the format is upgraded, the code in `numpy.io` will still 

88be able to read and write Version 1.0 files. 

89 

90Format Version 1.0 

91------------------ 

92 

93The first 6 bytes are a magic string: exactly ``\\x93NUMPY``. 

94 

95The next 1 byte is an unsigned byte: the major version number of the file 

96format, e.g. ``\\x01``. 

97 

98The next 1 byte is an unsigned byte: the minor version number of the file 

99format, e.g. ``\\x00``. Note: the version of the file format is not tied 

100to the version of the numpy package. 

101 

102The next 2 bytes form a little-endian unsigned short int: the length of 

103the header data HEADER_LEN. 

104 

105The next HEADER_LEN bytes form the header data describing the array's 

106format. It is an ASCII string which contains a Python literal expression 

107of a dictionary. It is terminated by a newline (``\\n``) and padded with 

108spaces (``\\x20``) to make the total of 

109``len(magic string) + 2 + len(length) + HEADER_LEN`` be evenly divisible 

110by 64 for alignment purposes. 

111 

112The dictionary contains three keys: 

113 

114 "descr" : dtype.descr 

115 An object that can be passed as an argument to the `numpy.dtype` 

116 constructor to create the array's dtype. 

117 "fortran_order" : bool 

118 Whether the array data is Fortran-contiguous or not. Since 

119 Fortran-contiguous arrays are a common form of non-C-contiguity, 

120 we allow them to be written directly to disk for efficiency. 

121 "shape" : tuple of int 

122 The shape of the array. 

123 

124For repeatability and readability, the dictionary keys are sorted in 

125alphabetic order. This is for convenience only. A writer SHOULD implement 

126this if possible. A reader MUST NOT depend on this. 

127 

128Following the header comes the array data. If the dtype contains Python 

129objects (i.e. ``dtype.hasobject is True``), then the data is a Python 

130pickle of the array. Otherwise the data is the contiguous (either C- 

131or Fortran-, depending on ``fortran_order``) bytes of the array. 

132Consumers can figure out the number of bytes by multiplying the number 

133of elements given by the shape (noting that ``shape=()`` means there is 

1341 element) by ``dtype.itemsize``. 

135 

136Format Version 2.0 

137------------------ 

138 

139The version 1.0 format only allowed the array header to have a total size of 

14065535 bytes. This can be exceeded by structured arrays with a large number of 

141columns. The version 2.0 format extends the header size to 4 GiB. 

142`numpy.save` will automatically save in 2.0 format if the data requires it, 

143else it will always use the more compatible 1.0 format. 

144 

145The description of the fourth element of the header therefore has become: 

146"The next 4 bytes form a little-endian unsigned int: the length of the header 

147data HEADER_LEN." 

148 

149Format Version 3.0 

150------------------ 

151 

152This version replaces the ASCII string (which in practice was latin1) with 

153a utf8-encoded string, so supports structured types with any unicode field 

154names. 

155 

156Notes 

157----- 

158The ``.npy`` format, including motivation for creating it and a comparison of 

159alternatives, is described in the 

160:doc:`"npy-format" NEP <neps:nep-0001-npy-format>`, however details have 

161evolved with time and this document is more current. 

162 

163""" 

164import numpy 

165import warnings 

166from numpy.lib.utils import safe_eval 

167from numpy.compat import ( 

168 isfileobj, os_fspath, pickle 

169 ) 

170 

171 

172__all__ = [] 

173 

174 

175EXPECTED_KEYS = {'descr', 'fortran_order', 'shape'} 

176MAGIC_PREFIX = b'\x93NUMPY' 

177MAGIC_LEN = len(MAGIC_PREFIX) + 2 

178ARRAY_ALIGN = 64 # plausible values are powers of 2 between 16 and 4096 

179BUFFER_SIZE = 2**18 # size of buffer for reading npz files in bytes 

180 

181# difference between version 1.0 and 2.0 is a 4 byte (I) header length 

182# instead of 2 bytes (H) allowing storage of large structured arrays 

183_header_size_info = { 

184 (1, 0): ('<H', 'latin1'), 

185 (2, 0): ('<I', 'latin1'), 

186 (3, 0): ('<I', 'utf8'), 

187} 

188 

189# Python's literal_eval is not actually safe for large inputs, since parsing 

190# may become slow or even cause interpreter crashes. 

191# This is an arbitrary, low limit which should make it safe in practice. 

192_MAX_HEADER_SIZE = 10000 

193 

194def _check_version(version): 

195 if version not in [(1, 0), (2, 0), (3, 0), None]: 

196 msg = "we only support format version (1,0), (2,0), and (3,0), not %s" 

197 raise ValueError(msg % (version,)) 

198 

199def magic(major, minor): 

200 """ Return the magic string for the given file format version. 

201 

202 Parameters 

203 ---------- 

204 major : int in [0, 255] 

205 minor : int in [0, 255] 

206 

207 Returns 

208 ------- 

209 magic : str 

210 

211 Raises 

212 ------ 

213 ValueError if the version cannot be formatted. 

214 """ 

215 if major < 0 or major > 255: 

216 raise ValueError("major version must be 0 <= major < 256") 

217 if minor < 0 or minor > 255: 

218 raise ValueError("minor version must be 0 <= minor < 256") 

219 return MAGIC_PREFIX + bytes([major, minor]) 

220 

221def read_magic(fp): 

222 """ Read the magic string to get the version of the file format. 

223 

224 Parameters 

225 ---------- 

226 fp : filelike object 

227 

228 Returns 

229 ------- 

230 major : int 

231 minor : int 

232 """ 

233 magic_str = _read_bytes(fp, MAGIC_LEN, "magic string") 

234 if magic_str[:-2] != MAGIC_PREFIX: 

235 msg = "the magic string is not correct; expected %r, got %r" 

236 raise ValueError(msg % (MAGIC_PREFIX, magic_str[:-2])) 

237 major, minor = magic_str[-2:] 

238 return major, minor 

239 

240def _has_metadata(dt): 

241 if dt.metadata is not None: 

242 return True 

243 elif dt.names is not None: 

244 return any(_has_metadata(dt[k]) for k in dt.names) 

245 elif dt.subdtype is not None: 

246 return _has_metadata(dt.base) 

247 else: 

248 return False 

249 

250def dtype_to_descr(dtype): 

251 """ 

252 Get a serializable descriptor from the dtype. 

253 

254 The .descr attribute of a dtype object cannot be round-tripped through 

255 the dtype() constructor. Simple types, like dtype('float32'), have 

256 a descr which looks like a record array with one field with '' as 

257 a name. The dtype() constructor interprets this as a request to give 

258 a default name. Instead, we construct descriptor that can be passed to 

259 dtype(). 

260 

261 Parameters 

262 ---------- 

263 dtype : dtype 

264 The dtype of the array that will be written to disk. 

265 

266 Returns 

267 ------- 

268 descr : object 

269 An object that can be passed to `numpy.dtype()` in order to 

270 replicate the input dtype. 

271 

272 """ 

273 if _has_metadata(dtype): 

274 warnings.warn("metadata on a dtype may be saved or ignored, but will " 

275 "raise if saved when read. Use another form of storage.", 

276 UserWarning, stacklevel=2) 

277 if dtype.names is not None: 

278 # This is a record array. The .descr is fine. XXX: parts of the 

279 # record array with an empty name, like padding bytes, still get 

280 # fiddled with. This needs to be fixed in the C implementation of 

281 # dtype(). 

282 return dtype.descr 

283 else: 

284 return dtype.str 

285 

286def descr_to_dtype(descr): 

287 """ 

288 Returns a dtype based off the given description. 

289 

290 This is essentially the reverse of `dtype_to_descr()`. It will remove 

291 the valueless padding fields created by, i.e. simple fields like 

292 dtype('float32'), and then convert the description to its corresponding 

293 dtype. 

294 

295 Parameters 

296 ---------- 

297 descr : object 

298 The object retrieved by dtype.descr. Can be passed to 

299 `numpy.dtype()` in order to replicate the input dtype. 

300 

301 Returns 

302 ------- 

303 dtype : dtype 

304 The dtype constructed by the description. 

305 

306 """ 

307 if isinstance(descr, str): 

308 # No padding removal needed 

309 return numpy.dtype(descr) 

310 elif isinstance(descr, tuple): 

311 # subtype, will always have a shape descr[1] 

312 dt = descr_to_dtype(descr[0]) 

313 return numpy.dtype((dt, descr[1])) 

314 

315 titles = [] 

316 names = [] 

317 formats = [] 

318 offsets = [] 

319 offset = 0 

320 for field in descr: 

321 if len(field) == 2: 

322 name, descr_str = field 

323 dt = descr_to_dtype(descr_str) 

324 else: 

325 name, descr_str, shape = field 

326 dt = numpy.dtype((descr_to_dtype(descr_str), shape)) 

327 

328 # Ignore padding bytes, which will be void bytes with '' as name 

329 # Once support for blank names is removed, only "if name == ''" needed) 

330 is_pad = (name == '' and dt.type is numpy.void and dt.names is None) 

331 if not is_pad: 

332 title, name = name if isinstance(name, tuple) else (None, name) 

333 titles.append(title) 

334 names.append(name) 

335 formats.append(dt) 

336 offsets.append(offset) 

337 offset += dt.itemsize 

338 

339 return numpy.dtype({'names': names, 'formats': formats, 'titles': titles, 

340 'offsets': offsets, 'itemsize': offset}) 

341 

342def header_data_from_array_1_0(array): 

343 """ Get the dictionary of header metadata from a numpy.ndarray. 

344 

345 Parameters 

346 ---------- 

347 array : numpy.ndarray 

348 

349 Returns 

350 ------- 

351 d : dict 

352 This has the appropriate entries for writing its string representation 

353 to the header of the file. 

354 """ 

355 d = {'shape': array.shape} 

356 if array.flags.c_contiguous: 

357 d['fortran_order'] = False 

358 elif array.flags.f_contiguous: 

359 d['fortran_order'] = True 

360 else: 

361 # Totally non-contiguous data. We will have to make it C-contiguous 

362 # before writing. Note that we need to test for C_CONTIGUOUS first 

363 # because a 1-D array is both C_CONTIGUOUS and F_CONTIGUOUS. 

364 d['fortran_order'] = False 

365 

366 d['descr'] = dtype_to_descr(array.dtype) 

367 return d 

368 

369 

370def _wrap_header(header, version): 

371 """ 

372 Takes a stringified header, and attaches the prefix and padding to it 

373 """ 

374 import struct 

375 assert version is not None 

376 fmt, encoding = _header_size_info[version] 

377 header = header.encode(encoding) 

378 hlen = len(header) + 1 

379 padlen = ARRAY_ALIGN - ((MAGIC_LEN + struct.calcsize(fmt) + hlen) % ARRAY_ALIGN) 

380 try: 

381 header_prefix = magic(*version) + struct.pack(fmt, hlen + padlen) 

382 except struct.error: 

383 msg = "Header length {} too big for version={}".format(hlen, version) 

384 raise ValueError(msg) from None 

385 

386 # Pad the header with spaces and a final newline such that the magic 

387 # string, the header-length short and the header are aligned on a 

388 # ARRAY_ALIGN byte boundary. This supports memory mapping of dtypes 

389 # aligned up to ARRAY_ALIGN on systems like Linux where mmap() 

390 # offset must be page-aligned (i.e. the beginning of the file). 

391 return header_prefix + header + b' '*padlen + b'\n' 

392 

393 

394def _wrap_header_guess_version(header): 

395 """ 

396 Like `_wrap_header`, but chooses an appropriate version given the contents 

397 """ 

398 try: 

399 return _wrap_header(header, (1, 0)) 

400 except ValueError: 

401 pass 

402 

403 try: 

404 ret = _wrap_header(header, (2, 0)) 

405 except UnicodeEncodeError: 

406 pass 

407 else: 

408 warnings.warn("Stored array in format 2.0. It can only be" 

409 "read by NumPy >= 1.9", UserWarning, stacklevel=2) 

410 return ret 

411 

412 header = _wrap_header(header, (3, 0)) 

413 warnings.warn("Stored array in format 3.0. It can only be " 

414 "read by NumPy >= 1.17", UserWarning, stacklevel=2) 

415 return header 

416 

417 

418def _write_array_header(fp, d, version=None): 

419 """ Write the header for an array and returns the version used 

420 

421 Parameters 

422 ---------- 

423 fp : filelike object 

424 d : dict 

425 This has the appropriate entries for writing its string representation 

426 to the header of the file. 

427 version : tuple or None 

428 None means use oldest that works. Providing an explicit version will 

429 raise a ValueError if the format does not allow saving this data. 

430 Default: None 

431 """ 

432 header = ["{"] 

433 for key, value in sorted(d.items()): 

434 # Need to use repr here, since we eval these when reading 

435 header.append("'%s': %s, " % (key, repr(value))) 

436 header.append("}") 

437 header = "".join(header) 

438 if version is None: 

439 header = _wrap_header_guess_version(header) 

440 else: 

441 header = _wrap_header(header, version) 

442 fp.write(header) 

443 

444def write_array_header_1_0(fp, d): 

445 """ Write the header for an array using the 1.0 format. 

446 

447 Parameters 

448 ---------- 

449 fp : filelike object 

450 d : dict 

451 This has the appropriate entries for writing its string 

452 representation to the header of the file. 

453 """ 

454 _write_array_header(fp, d, (1, 0)) 

455 

456 

457def write_array_header_2_0(fp, d): 

458 """ Write the header for an array using the 2.0 format. 

459 The 2.0 format allows storing very large structured arrays. 

460 

461 .. versionadded:: 1.9.0 

462 

463 Parameters 

464 ---------- 

465 fp : filelike object 

466 d : dict 

467 This has the appropriate entries for writing its string 

468 representation to the header of the file. 

469 """ 

470 _write_array_header(fp, d, (2, 0)) 

471 

472def read_array_header_1_0(fp, max_header_size=_MAX_HEADER_SIZE): 

473 """ 

474 Read an array header from a filelike object using the 1.0 file format 

475 version. 

476 

477 This will leave the file object located just after the header. 

478 

479 Parameters 

480 ---------- 

481 fp : filelike object 

482 A file object or something with a `.read()` method like a file. 

483 

484 Returns 

485 ------- 

486 shape : tuple of int 

487 The shape of the array. 

488 fortran_order : bool 

489 The array data will be written out directly if it is either 

490 C-contiguous or Fortran-contiguous. Otherwise, it will be made 

491 contiguous before writing it out. 

492 dtype : dtype 

493 The dtype of the file's data. 

494 max_header_size : int, optional 

495 Maximum allowed size of the header. Large headers may not be safe 

496 to load securely and thus require explicitly passing a larger value. 

497 See :py:meth:`ast.literal_eval()` for details. 

498 

499 Raises 

500 ------ 

501 ValueError 

502 If the data is invalid. 

503 

504 """ 

505 return _read_array_header( 

506 fp, version=(1, 0), max_header_size=max_header_size) 

507 

508def read_array_header_2_0(fp, max_header_size=_MAX_HEADER_SIZE): 

509 """ 

510 Read an array header from a filelike object using the 2.0 file format 

511 version. 

512 

513 This will leave the file object located just after the header. 

514 

515 .. versionadded:: 1.9.0 

516 

517 Parameters 

518 ---------- 

519 fp : filelike object 

520 A file object or something with a `.read()` method like a file. 

521 max_header_size : int, optional 

522 Maximum allowed size of the header. Large headers may not be safe 

523 to load securely and thus require explicitly passing a larger value. 

524 See :py:meth:`ast.literal_eval()` for details. 

525 

526 Returns 

527 ------- 

528 shape : tuple of int 

529 The shape of the array. 

530 fortran_order : bool 

531 The array data will be written out directly if it is either 

532 C-contiguous or Fortran-contiguous. Otherwise, it will be made 

533 contiguous before writing it out. 

534 dtype : dtype 

535 The dtype of the file's data. 

536 

537 Raises 

538 ------ 

539 ValueError 

540 If the data is invalid. 

541 

542 """ 

543 return _read_array_header( 

544 fp, version=(2, 0), max_header_size=max_header_size) 

545 

546 

547def _filter_header(s): 

548 """Clean up 'L' in npz header ints. 

549 

550 Cleans up the 'L' in strings representing integers. Needed to allow npz 

551 headers produced in Python2 to be read in Python3. 

552 

553 Parameters 

554 ---------- 

555 s : string 

556 Npy file header. 

557 

558 Returns 

559 ------- 

560 header : str 

561 Cleaned up header. 

562 

563 """ 

564 import tokenize 

565 from io import StringIO 

566 

567 tokens = [] 

568 last_token_was_number = False 

569 for token in tokenize.generate_tokens(StringIO(s).readline): 

570 token_type = token[0] 

571 token_string = token[1] 

572 if (last_token_was_number and 

573 token_type == tokenize.NAME and 

574 token_string == "L"): 

575 continue 

576 else: 

577 tokens.append(token) 

578 last_token_was_number = (token_type == tokenize.NUMBER) 

579 return tokenize.untokenize(tokens) 

580 

581 

582def _read_array_header(fp, version, max_header_size=_MAX_HEADER_SIZE): 

583 """ 

584 see read_array_header_1_0 

585 """ 

586 # Read an unsigned, little-endian short int which has the length of the 

587 # header. 

588 import struct 

589 hinfo = _header_size_info.get(version) 

590 if hinfo is None: 

591 raise ValueError("Invalid version {!r}".format(version)) 

592 hlength_type, encoding = hinfo 

593 

594 hlength_str = _read_bytes(fp, struct.calcsize(hlength_type), "array header length") 

595 header_length = struct.unpack(hlength_type, hlength_str)[0] 

596 header = _read_bytes(fp, header_length, "array header") 

597 header = header.decode(encoding) 

598 if len(header) > max_header_size: 

599 raise ValueError( 

600 f"Header info length ({len(header)}) is large and may not be safe " 

601 "to load securely.\n" 

602 "To allow loading, adjust `max_header_size` or fully trust " 

603 "the `.npy` file using `allow_pickle=True`.\n" 

604 "For safety against large resource use or crashes, sandboxing " 

605 "may be necessary.") 

606 

607 # The header is a pretty-printed string representation of a literal 

608 # Python dictionary with trailing newlines padded to a ARRAY_ALIGN byte 

609 # boundary. The keys are strings. 

610 # "shape" : tuple of int 

611 # "fortran_order" : bool 

612 # "descr" : dtype.descr 

613 # Versions (2, 0) and (1, 0) could have been created by a Python 2 

614 # implementation before header filtering was implemented. 

615 if version <= (2, 0): 

616 header = _filter_header(header) 

617 try: 

618 d = safe_eval(header) 

619 except SyntaxError as e: 

620 msg = "Cannot parse header: {!r}" 

621 raise ValueError(msg.format(header)) from e 

622 if not isinstance(d, dict): 

623 msg = "Header is not a dictionary: {!r}" 

624 raise ValueError(msg.format(d)) 

625 

626 if EXPECTED_KEYS != d.keys(): 

627 keys = sorted(d.keys()) 

628 msg = "Header does not contain the correct keys: {!r}" 

629 raise ValueError(msg.format(keys)) 

630 

631 # Sanity-check the values. 

632 if (not isinstance(d['shape'], tuple) or 

633 not all(isinstance(x, int) for x in d['shape'])): 

634 msg = "shape is not valid: {!r}" 

635 raise ValueError(msg.format(d['shape'])) 

636 if not isinstance(d['fortran_order'], bool): 

637 msg = "fortran_order is not a valid bool: {!r}" 

638 raise ValueError(msg.format(d['fortran_order'])) 

639 try: 

640 dtype = descr_to_dtype(d['descr']) 

641 except TypeError as e: 

642 msg = "descr is not a valid dtype descriptor: {!r}" 

643 raise ValueError(msg.format(d['descr'])) from e 

644 

645 return d['shape'], d['fortran_order'], dtype 

646 

647def write_array(fp, array, version=None, allow_pickle=True, pickle_kwargs=None): 

648 """ 

649 Write an array to an NPY file, including a header. 

650 

651 If the array is neither C-contiguous nor Fortran-contiguous AND the 

652 file_like object is not a real file object, this function will have to 

653 copy data in memory. 

654 

655 Parameters 

656 ---------- 

657 fp : file_like object 

658 An open, writable file object, or similar object with a 

659 ``.write()`` method. 

660 array : ndarray 

661 The array to write to disk. 

662 version : (int, int) or None, optional 

663 The version number of the format. None means use the oldest 

664 supported version that is able to store the data. Default: None 

665 allow_pickle : bool, optional 

666 Whether to allow writing pickled data. Default: True 

667 pickle_kwargs : dict, optional 

668 Additional keyword arguments to pass to pickle.dump, excluding 

669 'protocol'. These are only useful when pickling objects in object 

670 arrays on Python 3 to Python 2 compatible format. 

671 

672 Raises 

673 ------ 

674 ValueError 

675 If the array cannot be persisted. This includes the case of 

676 allow_pickle=False and array being an object array. 

677 Various other errors 

678 If the array contains Python objects as part of its dtype, the 

679 process of pickling them may raise various errors if the objects 

680 are not picklable. 

681 

682 """ 

683 _check_version(version) 

684 _write_array_header(fp, header_data_from_array_1_0(array), version) 

685 

686 if array.itemsize == 0: 

687 buffersize = 0 

688 else: 

689 # Set buffer size to 16 MiB to hide the Python loop overhead. 

690 buffersize = max(16 * 1024 ** 2 // array.itemsize, 1) 

691 

692 if array.dtype.hasobject: 

693 # We contain Python objects so we cannot write out the data 

694 # directly. Instead, we will pickle it out 

695 if not allow_pickle: 

696 raise ValueError("Object arrays cannot be saved when " 

697 "allow_pickle=False") 

698 if pickle_kwargs is None: 

699 pickle_kwargs = {} 

700 pickle.dump(array, fp, protocol=3, **pickle_kwargs) 

701 elif array.flags.f_contiguous and not array.flags.c_contiguous: 

702 if isfileobj(fp): 

703 array.T.tofile(fp) 

704 else: 

705 for chunk in numpy.nditer( 

706 array, flags=['external_loop', 'buffered', 'zerosize_ok'], 

707 buffersize=buffersize, order='F'): 

708 fp.write(chunk.tobytes('C')) 

709 else: 

710 if isfileobj(fp): 

711 array.tofile(fp) 

712 else: 

713 for chunk in numpy.nditer( 

714 array, flags=['external_loop', 'buffered', 'zerosize_ok'], 

715 buffersize=buffersize, order='C'): 

716 fp.write(chunk.tobytes('C')) 

717 

718 

719def read_array(fp, allow_pickle=False, pickle_kwargs=None, *, 

720 max_header_size=_MAX_HEADER_SIZE): 

721 """ 

722 Read an array from an NPY file. 

723 

724 Parameters 

725 ---------- 

726 fp : file_like object 

727 If this is not a real file object, then this may take extra memory 

728 and time. 

729 allow_pickle : bool, optional 

730 Whether to allow writing pickled data. Default: False 

731 

732 .. versionchanged:: 1.16.3 

733 Made default False in response to CVE-2019-6446. 

734 

735 pickle_kwargs : dict 

736 Additional keyword arguments to pass to pickle.load. These are only 

737 useful when loading object arrays saved on Python 2 when using 

738 Python 3. 

739 max_header_size : int, optional 

740 Maximum allowed size of the header. Large headers may not be safe 

741 to load securely and thus require explicitly passing a larger value. 

742 See :py:meth:`ast.literal_eval()` for details. 

743 This option is ignored when `allow_pickle` is passed. In that case 

744 the file is by definition trusted and the limit is unnecessary. 

745 

746 Returns 

747 ------- 

748 array : ndarray 

749 The array from the data on disk. 

750 

751 Raises 

752 ------ 

753 ValueError 

754 If the data is invalid, or allow_pickle=False and the file contains 

755 an object array. 

756 

757 """ 

758 if allow_pickle: 

759 # Effectively ignore max_header_size, since `allow_pickle` indicates 

760 # that the input is fully trusted. 

761 max_header_size = 2**64 

762 

763 version = read_magic(fp) 

764 _check_version(version) 

765 shape, fortran_order, dtype = _read_array_header( 

766 fp, version, max_header_size=max_header_size) 

767 if len(shape) == 0: 

768 count = 1 

769 else: 

770 count = numpy.multiply.reduce(shape, dtype=numpy.int64) 

771 

772 # Now read the actual data. 

773 if dtype.hasobject: 

774 # The array contained Python objects. We need to unpickle the data. 

775 if not allow_pickle: 

776 raise ValueError("Object arrays cannot be loaded when " 

777 "allow_pickle=False") 

778 if pickle_kwargs is None: 

779 pickle_kwargs = {} 

780 try: 

781 array = pickle.load(fp, **pickle_kwargs) 

782 except UnicodeError as err: 

783 # Friendlier error message 

784 raise UnicodeError("Unpickling a python object failed: %r\n" 

785 "You may need to pass the encoding= option " 

786 "to numpy.load" % (err,)) from err 

787 else: 

788 if isfileobj(fp): 

789 # We can use the fast fromfile() function. 

790 array = numpy.fromfile(fp, dtype=dtype, count=count) 

791 else: 

792 # This is not a real file. We have to read it the 

793 # memory-intensive way. 

794 # crc32 module fails on reads greater than 2 ** 32 bytes, 

795 # breaking large reads from gzip streams. Chunk reads to 

796 # BUFFER_SIZE bytes to avoid issue and reduce memory overhead 

797 # of the read. In non-chunked case count < max_read_count, so 

798 # only one read is performed. 

799 

800 # Use np.ndarray instead of np.empty since the latter does 

801 # not correctly instantiate zero-width string dtypes; see 

802 # https://github.com/numpy/numpy/pull/6430 

803 array = numpy.ndarray(count, dtype=dtype) 

804 

805 if dtype.itemsize > 0: 

806 # If dtype.itemsize == 0 then there's nothing more to read 

807 max_read_count = BUFFER_SIZE // min(BUFFER_SIZE, dtype.itemsize) 

808 

809 for i in range(0, count, max_read_count): 

810 read_count = min(max_read_count, count - i) 

811 read_size = int(read_count * dtype.itemsize) 

812 data = _read_bytes(fp, read_size, "array data") 

813 array[i:i+read_count] = numpy.frombuffer(data, dtype=dtype, 

814 count=read_count) 

815 

816 if fortran_order: 

817 array.shape = shape[::-1] 

818 array = array.transpose() 

819 else: 

820 array.shape = shape 

821 

822 return array 

823 

824 

825def open_memmap(filename, mode='r+', dtype=None, shape=None, 

826 fortran_order=False, version=None, *, 

827 max_header_size=_MAX_HEADER_SIZE): 

828 """ 

829 Open a .npy file as a memory-mapped array. 

830 

831 This may be used to read an existing file or create a new one. 

832 

833 Parameters 

834 ---------- 

835 filename : str or path-like 

836 The name of the file on disk. This may *not* be a file-like 

837 object. 

838 mode : str, optional 

839 The mode in which to open the file; the default is 'r+'. In 

840 addition to the standard file modes, 'c' is also accepted to mean 

841 "copy on write." See `memmap` for the available mode strings. 

842 dtype : data-type, optional 

843 The data type of the array if we are creating a new file in "write" 

844 mode, if not, `dtype` is ignored. The default value is None, which 

845 results in a data-type of `float64`. 

846 shape : tuple of int 

847 The shape of the array if we are creating a new file in "write" 

848 mode, in which case this parameter is required. Otherwise, this 

849 parameter is ignored and is thus optional. 

850 fortran_order : bool, optional 

851 Whether the array should be Fortran-contiguous (True) or 

852 C-contiguous (False, the default) if we are creating a new file in 

853 "write" mode. 

854 version : tuple of int (major, minor) or None 

855 If the mode is a "write" mode, then this is the version of the file 

856 format used to create the file. None means use the oldest 

857 supported version that is able to store the data. Default: None 

858 max_header_size : int, optional 

859 Maximum allowed size of the header. Large headers may not be safe 

860 to load securely and thus require explicitly passing a larger value. 

861 See :py:meth:`ast.literal_eval()` for details. 

862 

863 Returns 

864 ------- 

865 marray : memmap 

866 The memory-mapped array. 

867 

868 Raises 

869 ------ 

870 ValueError 

871 If the data or the mode is invalid. 

872 OSError 

873 If the file is not found or cannot be opened correctly. 

874 

875 See Also 

876 -------- 

877 numpy.memmap 

878 

879 """ 

880 if isfileobj(filename): 

881 raise ValueError("Filename must be a string or a path-like object." 

882 " Memmap cannot use existing file handles.") 

883 

884 if 'w' in mode: 

885 # We are creating the file, not reading it. 

886 # Check if we ought to create the file. 

887 _check_version(version) 

888 # Ensure that the given dtype is an authentic dtype object rather 

889 # than just something that can be interpreted as a dtype object. 

890 dtype = numpy.dtype(dtype) 

891 if dtype.hasobject: 

892 msg = "Array can't be memory-mapped: Python objects in dtype." 

893 raise ValueError(msg) 

894 d = dict( 

895 descr=dtype_to_descr(dtype), 

896 fortran_order=fortran_order, 

897 shape=shape, 

898 ) 

899 # If we got here, then it should be safe to create the file. 

900 with open(os_fspath(filename), mode+'b') as fp: 

901 _write_array_header(fp, d, version) 

902 offset = fp.tell() 

903 else: 

904 # Read the header of the file first. 

905 with open(os_fspath(filename), 'rb') as fp: 

906 version = read_magic(fp) 

907 _check_version(version) 

908 

909 shape, fortran_order, dtype = _read_array_header( 

910 fp, version, max_header_size=max_header_size) 

911 if dtype.hasobject: 

912 msg = "Array can't be memory-mapped: Python objects in dtype." 

913 raise ValueError(msg) 

914 offset = fp.tell() 

915 

916 if fortran_order: 

917 order = 'F' 

918 else: 

919 order = 'C' 

920 

921 # We need to change a write-only mode to a read-write mode since we've 

922 # already written data to the file. 

923 if mode == 'w+': 

924 mode = 'r+' 

925 

926 marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order, 

927 mode=mode, offset=offset) 

928 

929 return marray 

930 

931 

932def _read_bytes(fp, size, error_template="ran out of data"): 

933 """ 

934 Read from file-like object until size bytes are read. 

935 Raises ValueError if not EOF is encountered before size bytes are read. 

936 Non-blocking objects only supported if they derive from io objects. 

937 

938 Required as e.g. ZipExtFile in python 2.6 can return less data than 

939 requested. 

940 """ 

941 data = bytes() 

942 while True: 

943 # io files (default in python3) return None or raise on 

944 # would-block, python2 file will truncate, probably nothing can be 

945 # done about that. note that regular files can't be non-blocking 

946 try: 

947 r = fp.read(size - len(data)) 

948 data += r 

949 if len(r) == 0 or len(data) == size: 

950 break 

951 except BlockingIOError: 

952 pass 

953 if len(data) != size: 

954 msg = "EOF: reading %s, expected %d bytes got %d" 

955 raise ValueError(msg % (error_template, size, len(data))) 

956 else: 

957 return data