Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
125 views
in Technique[技术] by (71.8m points)

How does ARM Linux emulate the dirty, accessed, and file bits of a PTE?

As per pgtable-2-level.h, ARM Linux has two version of PTE; The Linux PTE and H/W PTE. Linux PTE are stored on below a offset of 1024 bytes.

When handling page fault in handle_pte_fault various function like pte_file, pte_mkdirty, pte_mkyoung, get invoke with the version H/W PTE.

But actually ARM H/W does not support the dirty, accessed and file bit in its PTE.

My question is how does it check the dirty, accessed, file bit of a page on H/W PTE? Ideally it should check those bit on Linux PTE which are stored below an offset of 1024 bytes?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

My question is how does it check the dirty, accessed, file bit of a page on H/W PTE?

TL;DR - they are emulated by taking a page fault on initial accesses.

The answers are given in pgtable-2-level.h,

The "dirty" bit is emulated by only granting hardware write permission iff the page is marked "writable" and "dirty" in the Linux PTE. This means that a write to a clean page will cause a permission fault, and the Linux MM layer will mark the page dirty via handle_pte_fault(). For the hardware to notice the permission change, the TLB entry must be flushed, and ptep_set_access_flags() does that for us.

To take the dirty case, the initial MMU mappings for page are marked read-only. When a process writes to it, a page fault is generated. This is the handle_pte_fault referenced and the main code is in fault.c as do_page_fault and will call the generic handle_mm_fault which eventually ends at handle_pte_fault. You can see the code,

if (flags & FAULT_FLAG_WRITE) {
        if (!pte_write(entry))
            return do_wp_page(mm, vma, address,
                    pte, pmd, ptl, entry);
        entry = pte_mkdirty(entry);  /** Here is the dirty emulation. **/
}

So the Linux generic code will examine the permission of the page, see it is suppose to be writeable and call the pte_mkdirty to mark the page as dirty; the whole process is kicked off or emulated through the fault handler. After the page is marked dirty in the Linux PTE, the ARM PTE is marked as writeable so subsequent writes do not cause a fault.

accessed is identical only both read and write will initially fault. A file bit is also completely unmapped and when a fault occurs, the Linux PTE is consulted to see if it is backed by a file or is it a completely unmapped page fault.

After the hardware table is updated with new permissions and book keeping is done, the user mode program is restarted at the faulting instruction and it would not notice the difference, besides the time interval to handle the fault.


ARM Linux uses 4k pages and the ARM 2nd level page tables are 1k in size (256 entries * 4bytes). From the pgtable-2-level.h comments,

Therefore, we tweak the implementation slightly - we tell Linux that we have 2048 entries in the first level, each of which is 8 bytes (iow, two hardware pointers to the second level.) The second level contains two hardware PTE tables arranged contiguously, preceded by Linux versions which contain the state information Linux needs. We, therefore, end up with 512 entries in the "PTE" level.

In order to use the full 4K page, the PTE entries are structured like,

  1. Linux PTE [n]
  2. Linux PTE [n+1]
  3. ARM PTE [n]
  4. ARM PTE [n+1]

Four 1k items for a full 4k page. These collections of pages must be managed per process to give each a unique view of memory and some information is shared in order to conserve real RAM. The function cpu_set_pte_ext is used to change the physical ARM entries. As each ARM CPU revision uses slightly different tables structures and features, there is an entry in the processor function table that points to an assembler routine. For instance, cpu_v7_set_pte_ext is the ARMv7 or typical original Cortex CPU implementation. This routine is responsible for examining the Linux flags and updating the hardware bits accordingly. As can be seen, the r3 is written to pte+2048 (offset from Linux PTE to hardware PTE) at the end of this routine. The assembler macro armv3_set_pte_ext in proc-marcos.S is used by many of the older CPU variants.

See: Tim's notes on ARM MM
         Page table entry (PTE) descriptor in Linux kernel for ARM


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...