diff --git a/doc/Makefile b/doc/Makefile index fce831d6fd902bae58272f825d871b4ad473b99e..257bc0bf9631882b2bd52ffe56f1f08dd4ef038b 100644 --- a/doc/Makefile +++ b/doc/Makefile @@ -107,7 +107,7 @@ MANPAGES_3_MD_PMEM2 = libpmem2/pmem2_errormsg.3.md libpmem2/pmem2_config_new.3.m libpmem2/pmem2_badblock_context_new.3.md libpmem2/pmem2_badblock_next.3.md \ libpmem2/pmem2_badblock_clear.3.md libpmem2/pmem2_config_set_protection.3.md \ libpmem2/pmem2_deep_flush.3.md libpmem2/pmem2_source_from_anon.3.md \ - libpmem2/pmem2_source_device_id.3.md libpmem2/pmem2_source_device_usc.3.md \ + libpmem2/pmem2_source_device_id.3.md libpmem2/pmem2_source_device_usc.3.md MANPAGES_1_MD_PMEM2 = ifeq ($(PMEM2_INSTALL),y) MANPAGES_3_DUMMY += libpmem2/pmem2_config_delete.3 libpmem2/pmem2_source_from_handle.3 libpmem2/pmem2_source_delete.3 \ diff --git a/doc/libpmem2/libpmem2_unsafe_shutdown.7.md b/doc/libpmem2/libpmem2_unsafe_shutdown.7.md index c08b73f6f1658b3d6cdc0c6ffaa5f650bf107bac..83c8940c6e72a0c3ea62ac58bfb3711037a2205e 100644 --- a/doc/libpmem2/libpmem2_unsafe_shutdown.7.md +++ b/doc/libpmem2/libpmem2_unsafe_shutdown.7.md @@ -8,15 +8,13 @@ date: pmem2 API version 1.0 ... [comment]: <> (SPDX-License-Identifier: BSD-3-Clause -[comment]: <> (Copyright 2019-2020, Intel Corporation) +[comment]: <> (Copyright 2020, Intel Corporation) [comment]: <> (libpmem2_unsafe_shutdown.7 -- man page for libpmem2 unsafe shutdown) [NAME](#name)<br /> [DESCRIPTION](#description)<br /> [UNSAFE SHUTDOWN DETECTION](#unsafe-shutdown-detection)<br /> -[DEEP SYNC REQUIRE](#deep-sync-required)<br /> -[CHECKSUM](#checksum)<br /> [SEE ALSO](#see-also) # NAME # @@ -26,223 +24,61 @@ date: pmem2 API version 1.0 # DESCRIPTION # In systems with the persistent memory support, *a power-fail protected domain* -covers a set of resources from which data will be flushed to *a persistent medium* -in case of *a power-failure*. Data stored on *the persistent medium* is preserved -across power cycles. - -The feature to flush all data stored in *the power-fail protected domain* to -*the persistent medium* is hardware-guaranteed. Since this feature is -mission-critical, the persistent memory support also requires the possibility -to detect cases when hardware fails to fulfill its guarantees in the face of -a power failure. Such an event is called *an unsafe (dirty) shutdown*. In case of -*the unsafe shutdown*, data that has not been flushed to *the persistent medium*, -is lost. - -To mitigate this risk NVDIMMs expose *an unsafe shutdown counter* (**USC**). -**USC** value is stored on each NVDIMM and its value is increased each time -*the unsafe shutdown* happens. **USC** is a monotonically increasing counter. - -> **Note**: *The unsafe shutdown* may corrupt data stored on a device, in a file, -> in a set of files and in a mapping spanning only a part of a file. -> For the sake of simplicity, all of these cases in this document will be called -> *the file*. +covers a set of resources from which the platform will flush data to the +*a persistent medium* in case of *a power-failure*. Data stored on +*the persistent medium* is preserved across power cycles. + +The hardware guarantees the feature to flush all data stored in +*the power-fail protected domain* to *the persistent medium*. However, nothing +is infallible, and Persistent Memory hardware can expose a monotonically +increasing *unsafe shutdown counter* (**USC**) that is incremented every time +a failure of the mechanism above is detected. This allows software to discover +situations where a running application was interrupted by a power failure that +led to an unsafe shutdown. Undiscovered unsafe shutdowns might cause silent data +corruption. + +>Note: *The unsafe shutdown* may corrupt data stored on a device, in a file, +in a set of files, and a mapping spanning only a part of a file. +For the sake of simplicity, all of the above cases will be called *file* below. # UNSAFE SHUTDOWN DETECTION # -A basic way of detecting *the unsafe shutdown* is by noticing the **USC** value -change. It requires comparing the past value of **USC** (```old_usc```) -with the current **USC** value (```new_usc```). If ```new_usc > old_usc``` it is -assumed *the unsafe shutdown* occurred. - -A current **USC** value can be obtained using **pmem2_source_device_usc**(3). -The ```old_usc``` value has to be stored by the user on a medium which will -preserve it across power cycles e.g. *the persistent medium*. - -> **Note**: Storing the ```old_usc``` value on *the persistent medium* despite -> being a natural choice requires considering a few caveats. For details please -> see the [DEEP SYNC REQUIRE](#deep-sync-required) section. - -The **USC**-only approach produces many false-positives. It should be applied -only if rebuilding the data, in a result of false-detected *unsafe shutdown*, -is not very expensive. - -In the basic *unsafe shutdown* detection method described above a one way of -inducing **USC** mismatch (and at the same time false-detecting *unsafe shutdown*) -is by copying *the file*. If *the file* have been copied to a different -*NVDIMMs set* all the knowledge which have been collected have to be invalidated -e.g. ```old_usc```. A unique identifier of the *NVDIMMs set* may be obtained -using **pmem2_source_device_id**(3). It also has to be stored by the -user on a medium which will preserve it across power cycles (```old_device_id```). - -The following pseudo-code takes into account ```old_usc``` and ```old_device_id```: - -```c -if (!unsafe_shutdown_info_initialized) { - unsafe_shutdown_info_initialize(); -} else if (old_device_id == new_device_id) { - if (old_usc == new_usc) { - /* the unsafe shutdown has NOT occurred. */ - } else { - /* the unsafe shutdown HAS occurred. The file might be currupted. */ - } -} else { - /* the file has been moved, the USC value doesn't matter. */ - unsafe_shutdown_info_initialize(); /* reinitialize */ -} -``` - -*The unsafe shutdown* cannot corrupt data already stored on *the persistent medium*. -If *the file* has been closed cleanly, it's certain the data hasn't been corrupted. -To take into account this factor, an indicator of *the file* being in use -(```file_in_use```) has to be added. Its value should be *FALSE* if the file is -not in use. Before performing any data modification, its value has to be altered -to *TRUE*. Similarly, when all modifications are done and *the file* will be -closed the ```file_in_use``` value has to be set to *FALSE*. - -> **Note**: The *unsafe shutdown* detection should be the first thing done after -> reopening *the file*. This means it has to be done before setting -> ```pool_in_use``` variable to *TRUE*. - -A robust (and fine-grained) *unsafe shutdown* detection which takes into account -all of the above factors looks as follow: - -```c -if (!unsafe_shutdown_info_initialized) { - unsafe_shutdown_info_initialize(); -} else if (old_device_id == new_device_id) { - if (old_usc == new_usc) { - /* the unsafe shutdown has NOT occurred. */ - } else { - - /* the unsafe shutdown HAS occurred... */ - - if (file_in_use) { - /* ... and the file was in use. The file might be corrupted. */ - } else { - /* ... but the file was not in use. Data is safe. */ - unsafe_shutdown_info_initialize(); /* reinitialize */ - } - } -} else { - /* the USC value doesn't matter. The file was moved... */ - - if (file_in_use) { - /* ... and was closed NOT cleanly. The file might be corrupted. */ - } else { - /* ... but after being closed cleanly. Data is safe. */ - unsafe_shutdown_info_initialize(); /* reinitialize */ - } -} -``` - -# DEEP SYNC REQUIRED # - -A *deep sync* is a way of making sure all the data already persisted will be -preserved in the face of *the unsafe shutdown*. **pmem2_deep_sync**() allows -proceeding with a program execution after making sure data has reached -*the persistent medium*. So it is a required element in implementing any -algorithm detecting *the unsafe shutdown*. For details how to use -**pmem2_deep_sync**() please see the **pmem2_deep_sync**(3) manual page. - -An application which implements the basic way of detecting *the unsafe shutdown* -has to store the ```old_usc``` value on *the persistent medium* -(*the power-fail protected domain* is not enough since it cannot rely on it -implementing an algorithm of detecting its failure). To achieve this the -```old_usc``` value has to be *deep synced* e.g.: - -```c -struct shutdown_info_basic { - uint64_t old_usc; -} *info; - -struct pmem2_source *src; -struct pmem2_map *map; -pmem2_persist_fn persist; - -/* store a USC value */ -pmem2_source_device_usc(src, &info->old_usc); - -/* persist and deep sync the new value */ -persist(info, sizeof(*info)); -pmem2_deep_sync(map, info, sizeof(*info)); - -/* old_usc is on the persistent medium so unsafe shutdown can't corrupt it. */ -``` - -Similarily ```old_device_id``` has to be *deep synced*. Things are a little bit -more complex regarding the ```file_in_use``` variable. It has to be *deep synced* -after setting its value to *TRUE* and before proceeding with the application but -also *the file* has to be *deep synced* before changing ```file_in_use``` value -to *FALSE*. A pseudo-code showing how to deal with it looks as follow: - -```c -uint64_t *file_in_use; -void *mapping_address; -size_t mapping_size; - -/* after opening the file*/ -assert(*file_in_use == FALSE); -*file_in_use = TRUE; -persist(file_in_use, sizeof(*file_in_use)); -pmem2_deep_sync(map, file_in_use, sizeof(*file_in_use)); - -/* the file is opened. It is safe to modify. */ - -/* deep sync all changes before closing the file */ -pmem2_deep_sync(map, mapping_address, mapping_size); - -/* closing the file */ -*file_in_use = FALSE; -persist(file_in_use, sizeof(*file_in_use)); -pmem2_deep_sync(map, file_in_use, sizeof(*file_in_use)); - -/* the file is closed cleanly. */ -``` - -# CHECKSUM # - -Aforementioned logic requires storing multiple variables. To make reasonable use -of them they must be consistent with each other e.g. if *the unsafe shutdown* -happens between storing ```old_usc``` and ```old_device_id``` their values are -inconsistent so after reboot the, the algorithm will detect *the unsafe shutdown* -where it hasn't happen or, even worse, not detect *the unsafe shutdown* even it -has happened. - -Storing a checksum along with the all required variables gives a certainty that -the read data is consistent. The pseudo-code using this method looks as follow: - -```c -struct shutdown_info { - uint64_t old_usc; - uint64_t old_device_id; - uint64_t file_in_use; - uint64_t checksum; -} *info; - -/* store new values */ -/* ... */ - -/* validate the new values */ -info->checksum = custom_checksum((void *)info, - sizeof(*info) - sizeof(info->checksum)); - -/* persist and deep sync the whole structure */ -persist(info, sizeof(*info)); -pmem2_deep_sync(map, info, sizeof(*info)); -``` - -> **Note**: Storing checksum along with other variables requires only a single -> persist and one *deep sync* for the whole structure. After the *deep sync* the -> data and checksum are stored on the *persistent medium*. If *unsafe shutdown* -> happens at any time during this process the user will detect a data-checksum -> mismatch. But detecting the data-checksum mismatch, despite it means -> *the unsafe shutdown* took place, in this case it does not mean a *usable data* -> (other than ```struct shutdown_info```) is corrupted. Since the checksum is -> modified before modifying any *usable data* or after *deep-syncing* all changes -> made to the *the usable data*, the data-checksum mismatch means the data was -> safe during *the unsafe shutdown*. +Software can detect an unsafe shutdown by watching for the change between +unsafe shutdown count value across application startups. Any changes can be +indicative of unsafe shutdown occurrence. + +Applications can implement a detection mechanism by storing the **USC** retrieved +from **pmem2_source_device_usc**(3) in Persistent Memory. Then, on subsequent +startups, the stored value must be compared with a newly retrieved one. + +However, this detection method can result in false-positives. Moving the file to +different Persistent Memory devices with possibly different **USC** values would +lead to false unsafe shutdown detection. + +Additionally, relying on **USC** value alone could result in the detection of +unsafe shutdown events that occur when such a shutdown has no chance of impacting +the data used by the application, e.g., when nothing is actively using the file. + +Applications can avoid false-positives associated with moving the file by storing +device identification, obtained through **pmem2_source_device_id**(3), alongside +the **USC**. This enables the software to check if the underlying device has +changed, and reinitialize the stored **USC** in such cases. + +The second behavior, detection of possibly irrelevant unsafe shutdown events, +if undesirable, can be prevented by storing a flag indicating whether the file +is in use, alongside all the rest of the relevant information. + +The application should use **pmem2_deep_flush**(3) when storing any data related +to unsafe shutdown detection for higher reliability. This helps ensure that the +detection mechanism is not reliant on the correct functioning of the same hardware +features it is designed to safeguard. + +General-purpose software should not assume the presence of **USC** in the platform, +and should instead appropriately handle any *PMEM2_E_NOSUPP* it encounters. +Doing otherwise might cause the software to be unnecessarily restrictive about +the hardware it supports and would prevent, e.g., testing on emulated PMEM. # SEE ALSO # -**pmem2_deep_sync**(3), **pmem2_persist_fn**(3), **pmem2_source_device_id**(3), +**pmem2_deep_flush**(3), **pmem2_persist_fn**(3), **pmem2_source_device_id**(3), **pmem2_source_device_usc**(3) and **<https://pmem.io>** diff --git a/doc/libpmem2/pmem2_source_device_id.3.md b/doc/libpmem2/pmem2_source_device_id.3.md index 9b655eb8959275d76a9902d51987f0d40cc58099..eeff69b7c32ebf1bf742848c1f22810138f7ac92 100644 --- a/doc/libpmem2/pmem2_source_device_id.3.md +++ b/doc/libpmem2/pmem2_source_device_id.3.md @@ -38,10 +38,12 @@ of all NVDIMMs backing the data source. This function has two operating modes: * if *\*id* is NULL the function calculates a buffer length required for storing the identifier of the *\*source* device and puts this length in *\*len* +The more hardware devices back the data source, the longer the length is. -* if *\*id* is not NULL it should point to a buffer of a length provided in *\*len*. -When **pmem2_source_device_id**() succeed it will store a unique identifier -of all NVDIMMs backing the data source. +* if *\*id* is not NULL it must point to a buffer of length *\*len* provided by +the previous call to this function. +On success, **pmem2_source_device_id**() will store a unique identifier +of all hardware devices backing the data source. For details on how to use the unique identifier for detecting *the unsafe shutdown* please refer to **libpmem2_unsafe_shutdown**(7) manual page. @@ -49,29 +51,30 @@ please refer to **libpmem2_unsafe_shutdown**(7) manual page. # RETURN VALUE # The **pmem2_source_device_id**() function returns 0 on success. -If the function fails, the *\*id* and *\*len* variables contents are undefined, +If the function fails, the *\*id* and *\*len* variables contents are left unmodified, and one of the following errors is returned: On all systems: -* **PMEM2_E_BUFFER_TOO_SMALL** - *\*len* indicates the *\*id* buffer is too short. +* **PMEM2_E_BUFFER_TOO_SMALL** - the provided buffer of length *\*len* is too +small to store the full identifier of the backing devices. +* **PMEM2_E_NOSUPP** - the underlying platform does not expose hardware +identification. On Windows: * -**errno** equivalent of return code set by failing **GetFinalPathNameByHandleW**(), while trying to resolve the volume path from the -file handle +file handle. * -**errno** set by failing **malloc**(3), while trying to allocate a buffer -for storing volume path +for storing volume path. * -**errno** equivalent of return code set by failing -**CreateFileW**(), while trying to obtain a handle to the volume +**CreateFileW**(), while trying to obtain a handle to the volume. * -**errno** equivalent of return code set by failing -**DeviceIoControl **(), while trying to obtain volume **USC** value - -* **PMEM2_E_NOSUPP** - if getting the **USC** value is not supported on the system +**DeviceIoControl **(), while trying to obtain volume **USC** value. On Linux: @@ -79,11 +82,7 @@ On Linux: descriptor. * -**errno** set by failing **ndctl_new**(), while trying to initiate a new -NDCTL library context - -On FreeBSD: - -* **PMEM2_E_NOSUPP** - since it is not yet supported +NDCTL library context. # SEE ALSO # diff --git a/doc/libpmem2/pmem2_source_device_usc.3.md b/doc/libpmem2/pmem2_source_device_usc.3.md index afafd299f8aab8ba307249500615792105bc2a5a..20cf67a9d3922703885f9ad832bec281cf01e9cc 100644 --- a/doc/libpmem2/pmem2_source_device_usc.3.md +++ b/doc/libpmem2/pmem2_source_device_usc.3.md @@ -34,34 +34,38 @@ int pmem2_source_device_usc(const struct pmem2_source *source, uint64_t *usc); # DESCRIPTION # -The **pmem2_source_device_usc**() function retrieves a sum of *unsafe shutdown counter* -(**USC**) values of all NVDIMMs backing the data source and puts the sum in *\*usc*. +The **pmem2_source_device_usc**() function retrieves the sum of the +*unsafe shutdown count*(**USC**) values of all hardware devices backing +the data source and stores it in *\*usc*. -For details on what **USC** is and how to use it for detecting *the unsafe shutdown* -please refer to **libpmem2_unsafe_shutdown**(7) manual page. +Please refer to **libpmem2_unsafe_shutdown**(7) for detailed description on how +to properly consume this information. # RETURN VALUE # The **pmem2_source_device_usc**() function returns 0 on success. -If the function fails, the *\*usc* variable content is undefined, and one of +If the function fails, the *\*usc* variable content is left unmodified, and one of the following errors is returned: +On all systems: + +* **PMEM2_E_NOSUPP** - the underlying platform does not expose unsafe shutdown +count information. + On Windows: * -**errno** equivalent of return code set by failing **GetFinalPathNameByHandleW**(), while trying to resolve volume path from the -file handle +file handle. * -**errno** set by failing **malloc**(3), while trying to allocate a buffer -for storing volume path +for storing volume path. * -**errno** equivalent of return code set by failing -**CreateFileW**(), while trying to obtain a handle to the volume +**CreateFileW**(), while trying to obtain a handle to the volume. * -**errno** equivalent of return code set by failing -**DeviceIoControl **(), while trying to obtain volume **USC** value - -* **PMEM2_E_NOSUPP** - if getting the **USC** value is not supported on the system +**DeviceIoControl**(), while trying to obtain volume **USC** value. On Linux: @@ -69,14 +73,10 @@ On Linux: descriptor. * -**errno** set by failing **ndctl_new**(), while trying to initiate a new -NDCTL library context - -* -**errno** set by failing **ndctl_dimm_get_dirty_shutdown **(), -while trying to obtain DIMM **USC** value - -On FreeBSD: +NDCTL library context. -* **PMEM2_E_NOSUPP** - since it is not yet supported +* -**errno** set by failing **ndctl_dimm_get_dirty_shutdown**(), +while trying to obtain DIMM **USC** value. # SEE ALSO #