diff --git a/doc/Makefile b/doc/Makefile index bd6ef6688d528d22ae2ded9f19488957ba5efae9..fce831d6fd902bae58272f825d871b4ad473b99e 100644 --- a/doc/Makefile +++ b/doc/Makefile @@ -92,7 +92,7 @@ MANPAGES_3_DUMMY_EXP = MANPAGES_1_DUMMY_EXP = # libpmem2 -MANPAGES_7_MD_PMEM2 = libpmem2/libpmem2.7.md +MANPAGES_7_MD_PMEM2 = libpmem2/libpmem2.7.md libpmem2/libpmem2_unsafe_shutdown.7.md MANPAGES_5_MD_PMEM2 = MANPAGES_3_MD_PMEM2 = libpmem2/pmem2_errormsg.3.md libpmem2/pmem2_config_new.3.md libpmem2/pmem2_map.3.md \ libpmem2/pmem2_unmap.3.md libpmem2/pmem2_map_get_address.3.md libpmem2/pmem2_map_get_size.3.md \ @@ -106,7 +106,8 @@ MANPAGES_3_MD_PMEM2 = libpmem2/pmem2_errormsg.3.md libpmem2/pmem2_config_new.3.m libpmem2/pmem2_config_set_vm_reservation.3.md libpmem2/pmem2_vm_reservation_new.3.md \ libpmem2/pmem2_badblock_context_new.3.md libpmem2/pmem2_badblock_next.3.md \ libpmem2/pmem2_badblock_clear.3.md libpmem2/pmem2_config_set_protection.3.md \ - libpmem2/pmem2_deep_flush.3.md libpmem2/pmem2_source_from_anon.3.md + libpmem2/pmem2_deep_flush.3.md libpmem2/pmem2_source_from_anon.3.md \ + libpmem2/pmem2_source_device_id.3.md libpmem2/pmem2_source_device_usc.3.md \ MANPAGES_1_MD_PMEM2 = ifeq ($(PMEM2_INSTALL),y) MANPAGES_3_DUMMY += libpmem2/pmem2_config_delete.3 libpmem2/pmem2_source_from_handle.3 libpmem2/pmem2_source_delete.3 \ diff --git a/doc/libpmem2/.gitignore b/doc/libpmem2/.gitignore index 710e48a6d6be83dc1a185d2dc402bffe1cb36593..1474ccdf6567edd1f69005e6a71088c1f8970f05 100644 --- a/doc/libpmem2/.gitignore +++ b/doc/libpmem2/.gitignore @@ -1,4 +1,5 @@ libpmem2.7 +libpmem2_unsafe_shutdown.7 pmem2_badblock_context_new.3 pmem2_badblock_next.3 pmem2_badblock_clear.3 @@ -25,6 +26,8 @@ pmem2_source_alignment.3 pmem2_source_from_fd.3 pmem2_source_from_anon.3 pmem2_source_size.3 +pmem2_source_device_id.3 +pmem2_source_device_usc.3 pmem2_unmap.3 pmem2_perror.3 pmem2_vm_reservation_new.3 diff --git a/doc/libpmem2/libpmem2_unsafe_shutdown.7.md b/doc/libpmem2/libpmem2_unsafe_shutdown.7.md new file mode 100644 index 0000000000000000000000000000000000000000..c08b73f6f1658b3d6cdc0c6ffaa5f650bf107bac --- /dev/null +++ b/doc/libpmem2/libpmem2_unsafe_shutdown.7.md @@ -0,0 +1,248 @@ +--- +layout: manual +Content-Style: 'text/css' +title: _MP(LIBPMEM2_UNSAFE_SHUTDOWN, 7) +collection: libpmem2 +header: PMDK +date: pmem2 API version 1.0 +... + +[comment]: <> (SPDX-License-Identifier: BSD-3-Clause +[comment]: <> (Copyright 2019-2020, Intel Corporation) + +[comment]: <> (libpmem2_unsafe_shutdown.7 -- man page for libpmem2 unsafe shutdown) + +[NAME](#name)<br /> +[DESCRIPTION](#description)<br /> +[UNSAFE SHUTDOWN DETECTION](#unsafe-shutdown-detection)<br /> +[DEEP SYNC REQUIRE](#deep-sync-required)<br /> +[CHECKSUM](#checksum)<br /> +[SEE ALSO](#see-also) + +# NAME # + +**libpmem2_unsafe_shutdown** - libpmem2 unsafe shutdown + +# DESCRIPTION # + +In systems with the persistent memory support, *a power-fail protected domain* +covers a set of resources from which data will be flushed to *a persistent medium* +in case of *a power-failure*. Data stored on *the persistent medium* is preserved +across power cycles. + +The feature to flush all data stored in *the power-fail protected domain* to +*the persistent medium* is hardware-guaranteed. Since this feature is +mission-critical, the persistent memory support also requires the possibility +to detect cases when hardware fails to fulfill its guarantees in the face of +a power failure. Such an event is called *an unsafe (dirty) shutdown*. In case of +*the unsafe shutdown*, data that has not been flushed to *the persistent medium*, +is lost. + +To mitigate this risk NVDIMMs expose *an unsafe shutdown counter* (**USC**). +**USC** value is stored on each NVDIMM and its value is increased each time +*the unsafe shutdown* happens. **USC** is a monotonically increasing counter. + +> **Note**: *The unsafe shutdown* may corrupt data stored on a device, in a file, +> in a set of files and in a mapping spanning only a part of a file. +> For the sake of simplicity, all of these cases in this document will be called +> *the file*. + +# UNSAFE SHUTDOWN DETECTION # + +A basic way of detecting *the unsafe shutdown* is by noticing the **USC** value +change. It requires comparing the past value of **USC** (```old_usc```) +with the current **USC** value (```new_usc```). If ```new_usc > old_usc``` it is +assumed *the unsafe shutdown* occurred. + +A current **USC** value can be obtained using **pmem2_source_device_usc**(3). +The ```old_usc``` value has to be stored by the user on a medium which will +preserve it across power cycles e.g. *the persistent medium*. + +> **Note**: Storing the ```old_usc``` value on *the persistent medium* despite +> being a natural choice requires considering a few caveats. For details please +> see the [DEEP SYNC REQUIRE](#deep-sync-required) section. + +The **USC**-only approach produces many false-positives. It should be applied +only if rebuilding the data, in a result of false-detected *unsafe shutdown*, +is not very expensive. + +In the basic *unsafe shutdown* detection method described above a one way of +inducing **USC** mismatch (and at the same time false-detecting *unsafe shutdown*) +is by copying *the file*. If *the file* have been copied to a different +*NVDIMMs set* all the knowledge which have been collected have to be invalidated +e.g. ```old_usc```. A unique identifier of the *NVDIMMs set* may be obtained +using **pmem2_source_device_id**(3). It also has to be stored by the +user on a medium which will preserve it across power cycles (```old_device_id```). + +The following pseudo-code takes into account ```old_usc``` and ```old_device_id```: + +```c +if (!unsafe_shutdown_info_initialized) { + unsafe_shutdown_info_initialize(); +} else if (old_device_id == new_device_id) { + if (old_usc == new_usc) { + /* the unsafe shutdown has NOT occurred. */ + } else { + /* the unsafe shutdown HAS occurred. The file might be currupted. */ + } +} else { + /* the file has been moved, the USC value doesn't matter. */ + unsafe_shutdown_info_initialize(); /* reinitialize */ +} +``` + +*The unsafe shutdown* cannot corrupt data already stored on *the persistent medium*. +If *the file* has been closed cleanly, it's certain the data hasn't been corrupted. +To take into account this factor, an indicator of *the file* being in use +(```file_in_use```) has to be added. Its value should be *FALSE* if the file is +not in use. Before performing any data modification, its value has to be altered +to *TRUE*. Similarly, when all modifications are done and *the file* will be +closed the ```file_in_use``` value has to be set to *FALSE*. + +> **Note**: The *unsafe shutdown* detection should be the first thing done after +> reopening *the file*. This means it has to be done before setting +> ```pool_in_use``` variable to *TRUE*. + +A robust (and fine-grained) *unsafe shutdown* detection which takes into account +all of the above factors looks as follow: + +```c +if (!unsafe_shutdown_info_initialized) { + unsafe_shutdown_info_initialize(); +} else if (old_device_id == new_device_id) { + if (old_usc == new_usc) { + /* the unsafe shutdown has NOT occurred. */ + } else { + + /* the unsafe shutdown HAS occurred... */ + + if (file_in_use) { + /* ... and the file was in use. The file might be corrupted. */ + } else { + /* ... but the file was not in use. Data is safe. */ + unsafe_shutdown_info_initialize(); /* reinitialize */ + } + } +} else { + /* the USC value doesn't matter. The file was moved... */ + + if (file_in_use) { + /* ... and was closed NOT cleanly. The file might be corrupted. */ + } else { + /* ... but after being closed cleanly. Data is safe. */ + unsafe_shutdown_info_initialize(); /* reinitialize */ + } +} +``` + +# DEEP SYNC REQUIRED # + +A *deep sync* is a way of making sure all the data already persisted will be +preserved in the face of *the unsafe shutdown*. **pmem2_deep_sync**() allows +proceeding with a program execution after making sure data has reached +*the persistent medium*. So it is a required element in implementing any +algorithm detecting *the unsafe shutdown*. For details how to use +**pmem2_deep_sync**() please see the **pmem2_deep_sync**(3) manual page. + +An application which implements the basic way of detecting *the unsafe shutdown* +has to store the ```old_usc``` value on *the persistent medium* +(*the power-fail protected domain* is not enough since it cannot rely on it +implementing an algorithm of detecting its failure). To achieve this the +```old_usc``` value has to be *deep synced* e.g.: + +```c +struct shutdown_info_basic { + uint64_t old_usc; +} *info; + +struct pmem2_source *src; +struct pmem2_map *map; +pmem2_persist_fn persist; + +/* store a USC value */ +pmem2_source_device_usc(src, &info->old_usc); + +/* persist and deep sync the new value */ +persist(info, sizeof(*info)); +pmem2_deep_sync(map, info, sizeof(*info)); + +/* old_usc is on the persistent medium so unsafe shutdown can't corrupt it. */ +``` + +Similarily ```old_device_id``` has to be *deep synced*. Things are a little bit +more complex regarding the ```file_in_use``` variable. It has to be *deep synced* +after setting its value to *TRUE* and before proceeding with the application but +also *the file* has to be *deep synced* before changing ```file_in_use``` value +to *FALSE*. A pseudo-code showing how to deal with it looks as follow: + +```c +uint64_t *file_in_use; +void *mapping_address; +size_t mapping_size; + +/* after opening the file*/ +assert(*file_in_use == FALSE); +*file_in_use = TRUE; +persist(file_in_use, sizeof(*file_in_use)); +pmem2_deep_sync(map, file_in_use, sizeof(*file_in_use)); + +/* the file is opened. It is safe to modify. */ + +/* deep sync all changes before closing the file */ +pmem2_deep_sync(map, mapping_address, mapping_size); + +/* closing the file */ +*file_in_use = FALSE; +persist(file_in_use, sizeof(*file_in_use)); +pmem2_deep_sync(map, file_in_use, sizeof(*file_in_use)); + +/* the file is closed cleanly. */ +``` + +# CHECKSUM # + +Aforementioned logic requires storing multiple variables. To make reasonable use +of them they must be consistent with each other e.g. if *the unsafe shutdown* +happens between storing ```old_usc``` and ```old_device_id``` their values are +inconsistent so after reboot the, the algorithm will detect *the unsafe shutdown* +where it hasn't happen or, even worse, not detect *the unsafe shutdown* even it +has happened. + +Storing a checksum along with the all required variables gives a certainty that +the read data is consistent. The pseudo-code using this method looks as follow: + +```c +struct shutdown_info { + uint64_t old_usc; + uint64_t old_device_id; + uint64_t file_in_use; + uint64_t checksum; +} *info; + +/* store new values */ +/* ... */ + +/* validate the new values */ +info->checksum = custom_checksum((void *)info, + sizeof(*info) - sizeof(info->checksum)); + +/* persist and deep sync the whole structure */ +persist(info, sizeof(*info)); +pmem2_deep_sync(map, info, sizeof(*info)); +``` + +> **Note**: Storing checksum along with other variables requires only a single +> persist and one *deep sync* for the whole structure. After the *deep sync* the +> data and checksum are stored on the *persistent medium*. If *unsafe shutdown* +> happens at any time during this process the user will detect a data-checksum +> mismatch. But detecting the data-checksum mismatch, despite it means +> *the unsafe shutdown* took place, in this case it does not mean a *usable data* +> (other than ```struct shutdown_info```) is corrupted. Since the checksum is +> modified before modifying any *usable data* or after *deep-syncing* all changes +> made to the *the usable data*, the data-checksum mismatch means the data was +> safe during *the unsafe shutdown*. + +# SEE ALSO # + +**pmem2_deep_sync**(3), **pmem2_persist_fn**(3), **pmem2_source_device_id**(3), +**pmem2_source_device_usc**(3) and **<https://pmem.io>** diff --git a/doc/libpmem2/pmem2_source_device_id.3.md b/doc/libpmem2/pmem2_source_device_id.3.md new file mode 100644 index 0000000000000000000000000000000000000000..9b655eb8959275d76a9902d51987f0d40cc58099 --- /dev/null +++ b/doc/libpmem2/pmem2_source_device_id.3.md @@ -0,0 +1,91 @@ +--- +layout: manual +Content-Style: 'text/css' +title: _MP(PMEM2_SOURCE_DEVICE_ID, 3) +collection: libpmem2 +header: PMDK +date: pmem2 API version 1.0 +... + +[comment]: <> (SPDX-License-Identifier: BSD-3-Clause) +[comment]: <> (Copyright 2020, Intel Corporation) + +[comment]: <> (pmem2_source_device_id.3 -- man page for pmem2_source_device_id) + +[NAME](#name)<br /> +[SYNOPSIS](#synopsis)<br /> +[DESCRIPTION](#description)<br /> +[RETURN VALUE](#return-value)<br /> +[SEE ALSO](#see-also)<br /> + +# NAME # + +**pmem2_source_device_id**() - returns the unique identifier of a device + +# SYNOPSIS # + +```c +#include <libpmem2.h> + +struct pmem2_source; +int pmem2_source_device_id(const struct pmem2_source *source, char *id, size_t *len); +``` + +# DESCRIPTION # + +The **pmem2_source_device_id**() function retrieves a unique identifier +of all NVDIMMs backing the data source. This function has two operating modes: + +* if *\*id* is NULL the function calculates a buffer length required for +storing the identifier of the *\*source* device and puts this length in *\*len* + +* if *\*id* is not NULL it should point to a buffer of a length provided in *\*len*. +When **pmem2_source_device_id**() succeed it will store a unique identifier +of all NVDIMMs backing the data source. + +For details on how to use the unique identifier for detecting *the unsafe shutdown* +please refer to **libpmem2_unsafe_shutdown**(7) manual page. + +# RETURN VALUE # + +The **pmem2_source_device_id**() function returns 0 on success. +If the function fails, the *\*id* and *\*len* variables contents are undefined, +and one of the following errors is returned: + +On all systems: + +* **PMEM2_E_BUFFER_TOO_SMALL** - *\*len* indicates the *\*id* buffer is too short. + +On Windows: + +* -**errno** equivalent of return code set by failing +**GetFinalPathNameByHandleW**(), while trying to resolve the volume path from the +file handle + +* -**errno** set by failing **malloc**(3), while trying to allocate a buffer +for storing volume path + +* -**errno** equivalent of return code set by failing +**CreateFileW**(), while trying to obtain a handle to the volume + +* -**errno** equivalent of return code set by failing +**DeviceIoControl **(), while trying to obtain volume **USC** value + +* **PMEM2_E_NOSUPP** - if getting the **USC** value is not supported on the system + +On Linux: + +* -**errno** set by failing **fstat**(2), while trying to validate the file +descriptor. + +* -**errno** set by failing **ndctl_new**(), while trying to initiate a new +NDCTL library context + +On FreeBSD: + +* **PMEM2_E_NOSUPP** - since it is not yet supported + +# SEE ALSO # + +**fstat**(2), **errno**(3), **malloc**(3), **libpmem2_unsafe_shutdown**(7), + and **<http://pmem.io>** diff --git a/doc/libpmem2/pmem2_source_device_usc.3.md b/doc/libpmem2/pmem2_source_device_usc.3.md new file mode 100644 index 0000000000000000000000000000000000000000..afafd299f8aab8ba307249500615792105bc2a5a --- /dev/null +++ b/doc/libpmem2/pmem2_source_device_usc.3.md @@ -0,0 +1,84 @@ +--- +layout: manual +Content-Style: 'text/css' +title: _MP(PMEM2_SOURCE_DEVICE_USC, 3) +collection: libpmem2 +header: PMDK +date: pmem2 API version 1.0 +... + +[comment]: <> (SPDX-License-Identifier: BSD-3-Clause) +[comment]: <> (Copyright 2020, Intel Corporation) + +[comment]: <> (pmem2_source_device_usc.3 -- man page for pmem2_source_device_usc) + +[NAME](#name)<br /> +[SYNOPSIS](#synopsis)<br /> +[DESCRIPTION](#description)<br /> +[RETURN VALUE](#return-value)<br /> +[SEE ALSO](#see-also)<br /> + +# NAME # + +**pmem2_source_device_usc**() - returns the *unsafe shutdown counter* value of a +device + +# SYNOPSIS # + +```c +#include <libpmem2.h> + +struct pmem2_source; +int pmem2_source_device_usc(const struct pmem2_source *source, uint64_t *usc); +``` + +# DESCRIPTION # + +The **pmem2_source_device_usc**() function retrieves a sum of *unsafe shutdown counter* +(**USC**) values of all NVDIMMs backing the data source and puts the sum in *\*usc*. + +For details on what **USC** is and how to use it for detecting *the unsafe shutdown* +please refer to **libpmem2_unsafe_shutdown**(7) manual page. + +# RETURN VALUE # + +The **pmem2_source_device_usc**() function returns 0 on success. +If the function fails, the *\*usc* variable content is undefined, and one of +the following errors is returned: + +On Windows: + +* -**errno** equivalent of return code set by failing +**GetFinalPathNameByHandleW**(), while trying to resolve volume path from the +file handle + +* -**errno** set by failing **malloc**(3), while trying to allocate a buffer +for storing volume path + +* -**errno** equivalent of return code set by failing +**CreateFileW**(), while trying to obtain a handle to the volume + +* -**errno** equivalent of return code set by failing +**DeviceIoControl **(), while trying to obtain volume **USC** value + +* **PMEM2_E_NOSUPP** - if getting the **USC** value is not supported on the system + +On Linux: + +* -**errno** set by failing **fstat**(2), while trying to validate the file +descriptor. + +* -**errno** set by failing **ndctl_new**(), while trying to initiate a new +NDCTL library context + +* -**errno** set by failing **ndctl_dimm_get_dirty_shutdown **(), +while trying to obtain DIMM **USC** value + +On FreeBSD: + +* **PMEM2_E_NOSUPP** - since it is not yet supported + +# SEE ALSO # + +**fstat**(2), **errno**(3), **malloc**(3), **libpmem2_unsafe_shutdown**(7), + and **<http://pmem.io>**