Developers Planet

20 May 2021

Marcin Juszkiewicz

My twenty plus years of IRC

In 1996 I started studies at Białystok University of Technology. And one of early days I found that corridor with text terminals. Some time later I joined that crowd and started using HP-2623A term with SunOS account.

And one of first applications crowd shown me was ircII (other were bash, screen, pine and ncftp). So I am IRC user for over 24 years now and this XKCD comics can be about me:

2078: He announces that he's finally making the jump from screen+irssi to tmux+weechat.
Team chat

Clients

As wrote above I started with ircII. It was painful to use so quickly some scripts landed — Venom, Lice and some others. Tried Epic, Epic2000 and some other clients to finally end with Irssi. I have never been a fan of GUI based ones.

There were moments when people used CTCP VERSION to check what clients other users use. In old Amiga days I usually had it set to similar one as AmIRC one but with version bumped above whatever was released. Simple trolling for all those curious people. Nowadays it simply replies with “telnet” and there was a day when I logged to IRC server and exchanged some messages using just telnet :D

Networks

For years I was user of IRCnet. It was popular in Poland so why bother with checking other networks. But as time passed and I became more involved in FOSS projects there was a need to start using Freenode, then OFTC, Mozilla etc.

Checked how old my accounts are as it nicely show when I started using which network.

IRCnet

IRCnet was my first IRC network. Stopped using it few years ago as all channels I was on went quiet or migrated elsewhere (mostly Freenode).

For years I was visible on Amiga channels #amisia, #amigapl where I met several friends and I am in contact with many of them still.

Freenode

The journey started on 15th May 2004. This was time when I started playing with OpenEmbedded and knew that this is a project where I will spend some of my free time (it became hobby, then job).

It was a place where CentOS, Fedora, Linaro, OpenStack and several other projects were present.

NickServ- Information on Hrw (account hrw):
NickServ- Registered : Mar 15 10:59:47 2004 (17y 9w 6d ago)
NickServ- Last addr  : ~hrw@redhat/hrw
NickServ- vHost      : redhat/hrw
NickServ- Last seen  : now
NickServ- Flags      : HideMail
NickServ- *** End of Info ***

OFTC

For me OFTC means Debian. Later also virtualization stuff as QEMU and libvirt folks sit there.

NickServ- Nickname information for hrw (Marcin Juszkiewicz)
NickServ- hrw is currently online
NickServ-   Time registered: Fri 10 Jun 2011 17:43:55 +0000 (9y 11m 10d 14:47:05 ago)
NickServ- Account last quit: Tue 18 May 2021 09:07:55 +0000 (1d 23:23:05 ago)
NickServ- Last quit message: Remote host closed the connection
NickServ-         Last host: 00019652.user.oftc.net
NickServ-               URL: http://marcin.juszkiewicz.com.pl/
NickServ-      Cloak string: Not set
NickServ-          Language: English (0)
NickServ-           ENFORCE: ON
NickServ-            SECURE: OFF
NickServ-           PRIVATE: ON
NickServ-             CLOAK: ON
NickServ-          VERIFIED: YES

Libera

And since yesterday I am on Libera as well.

NickServ- Information on hrw (account hrw):
NickServ- Registered : May 19 15:26:18 2021 +0000 (17h 32m 35s ago)
NickServ- Last seen  : now
NickServ- Flags      : HideMail, Private
NickServ- *** End of Info ***

What next?

I saw and used several instant messaging platforms. All of them were younger than IRC. Many of them are no longer popular, several are no longer existing. IRC survived so I continue to use it.

by Marcin Juszkiewicz at 20 May 2021, 08:31

03 May 2021

Jon Masters

Introduction and call for topics

This blog is a personal pet project of mine. I live and breathe for Computer Architecture, semiconductor technology, and all things in between. I was originally a Computer Scientist and so I am self taught when it comes to computer architecture, digital design, and other parts of the “full stack”. I don’t believe in arbitrary barriers between these disciplines. In fact, I believe that hardware and software are better when designed together, and that we as engineers are better when we see all of us as being on the same team, together.

My aim over the coming months is to cover a mixture of topics. Some will be “introductory” (but arbitrarily detailed). For example, I will explain exactly how a compiled/assembled program into machine code is executed in a modern Out-of-Order processor from before fetch (prefetch/predecode, etc.) through to dispatch and backend data flow execution. I will explain how speculation works, what kinds of predictors and state are kept, etc. I will walk through what a cache coherency protocol is, how it works, and how it differs from memory consistency. I will explain how modern “heterogenous” and “composable” systems work. I will attempt to provide useful examples and references to gem5 models and the like.

Some posts will dive into specific topics that are more advanced in nature. For example, I have recently been consumed with memory barriers and how one might speculate right through them using concepts familiar to those working on transactional memory. I might write this up at a high level, and give a related example of TSO implementation on x86.

My writing style will be technical in nature, but I intend for posts to be broadly accessible to those who work in the technology field (but not necessarily in microprocessor design). I will not cover anything proprietary or confidential to any one vendor, but general concepts.

Please reach out to me or comment below with ideas for topics you would like covered.

by jonmasters at 03 May 2021, 05:03

01 May 2021

Jon Masters

I see dead uops: thoughts on the latest Spectre paper targeting uop caches

Last night, a group of computer security researchers lead by Ashish Venkat (University of Virginia) published a paper titled “I See Dead µops: Leaking Secrets via Intel/AMD Micro-Op Caches” which they have submitted for ISCA ’21 (International Symposium on Computer Architecture, a prestigious academic conference). The paper concerns a structure called the micro-op (uop) cache that is commonly used within modern microprocessors. A thorough analysis of the organization and implementation of these structures allows the research team to propose novel timing side-channel attacks similar to those of “Spectre“.

Modern microprocessors are decoupled into a “front-end” that decodes the instructions as presented by the programmer (in the form of compiled machine code) and a “back-end” that actually executes instructions (quite likely in the form of a data flow model, Out-of-Order with respect to how the program was actually written, but nonetheless respecting the actual dependencies). Notice I didn’t say “these instructions” a second time. What the back-end actually executes may look similar to the machine code presented at the front, but it might also look very different. Perhaps it has simply been optimized (e.g. fusing two – or more – adjacent instructions together into a more efficient one) but more often than not the backend of the machine will be executing very different instructions known as micro-ops.

A micro-op (uop) is a very simple instruction. For example, it might be an ADD instruction. It may take a couple of inputs and produce one output. In some cases (such as a simple add), the “macro” ops of the machine code written by the tooling used by a programmer map 1:1 on to the uops used by the machine. But often, a single macro op is actually decomposed into many individual uops. In RISC-like machines (Arm, RISC-V, etc.) such cracking into multiple uops is usually limited to relatively complex instructions (such as certain loads that also perform complex address generation logic), but most modern CISC machines (e.g. x86) will actually decompose every x86 macro instruction into a stream of RISC uops.

In such a machine, the processor frontend will be coupled to the instruction cache through a “fetch” unit that will retrieve a certain number of bytes (e.g. 16 in the paper cited) per processor cycle (clock “tick”) from the “L1” (level 1) instruction cache. The L1 instruction cache contains the program machine instruction “code”, represented as a sequence of bytes, typically stored into 64 byte cache “lines”. The L1 is small but extremely fast. It is typically “coherent” with respect to other caches in the machine, meaning that if you were to write self-modifying code out via the data cache, the instruction cache automatically observes this. The L1 might do many other things, such as pre-decode instructions (e.g. tag those that are branches ahead of time, etc.), or be the target of other timing side-channel attacks (such as the original Spectre attacks), but that is not the focus of this latest analysis.

Once bytes of instructions are fetched, they are run through decoders. Particularly with x86 (but also elsewhere), decoding macro operations is expensive. Worse, if a piece of code is “hot” because it is part of a loop that is being executed hundreds or millions of times, decoding the same macro instructions over and over can add needless expense (both to performance overhead, but also energy use). For this reason, it is common in some designs to cache decoded uops and replay them out of the uop cache as needed. Think of the uop cache as a companion to the other caches we already have in our modern processors – the instruction and data caches being just two of many different examples in modern designs.

Keeping a cache of decoded uops can save a lot of overhead, and increase throughput, especially when doing a CISC->RISC translation. Individual x86 macro instructions are variable width, up to 15 bytes in length (perhaps theoretically 16?), and decoding them is extremely unpleasant. Worse, the throughput per cycle is dependent upon the complexity of the instructions. Unlike a clean RISC architecture where you might build a wide uniform decoder that can fetch and stream 4, 8, or even more instructions to the processor backend each cycle, doing this at a sustained rate on x86 would require a much wider fetch (as each instruction might be from 1 to 15 bytes in length), and a lot of bit and byte swizzling, instruction length boundary detection, and so on. It’s a huge potential bottleneck, so the uop cache allows a higher sustained rate of instruction throughput than otherwise.

Furthermore, adding the uop cache means that the backend of a modern x86 processor isn’t all that different from any other RISC machine. If you want to learn more about that, search for the original “RISC86” work done by Greg Favor at NexGen (later AMD).

The problem with having a uop cache is that it adds an observable timing side-channel into the mix. Depending upon which instructions are executing, they may be fetched from the uop cache, or they may have to be decoded directly from the x86 macro ops. Worse, the uop cache is populated during instruction fetch/decode prior to any part of the backend getting involved. This means that existing “Spectre” mitigations that rely upon controlling execution happen too late in the pipeline – the offending uops are already present by then.

The paper first determines the “organization” (cache size and structure – sets and ways, etc.) and then conclusively shows that the uop cache implementation used by modern Intel processors features a “hot replacement” policy that guarantees recently executed uops will be present. It then introduces a “tiger” / “zebra” displacement attack that can be used by an attacker in order to determine which uops were recently executed. The uop cache structure used by Intel in their popular “Skylake” derived designs is revealed to feature lines of 6 uops that are organized into an 8-way set-associative cache with 32-sets. Instructions are “streamed” out of the uop cache set-by-set sequentially, following various rules.

It is claimed that Intel processors do not competitively share uop cache space between sibling “Hyperthreads” of an SMT (Simultaneous Multi-Threading) core. This means that the uop cache is split physically (but evenly) between two SMT threads in a way that one cannot measure the other. However, the researchers claim that certain AMD processors do have competitive sharing of SMT resources, meaning that a sibling thread might be able to monitor the uop cache footprint of the other. At the same time, it should be noted that the entire industry is moving away from a model of mixing trusted and untrusted workloads across sibling hyperthreads. Today, it is more common to give siblings only to the same workload, precisely because of the number of concerns that have emerged about SMT.

The rules for which instructions can live in the uop cache include a limit on the number of uops per macro instruction (“a given 32-byte code region may consume a maximum of 3 lines in the set (i.e. up to 18 micro-ops)”), as well as requiring that an unconditional branch must be last instruction in a uop group. There are many other constraints, which when taken together with the determined replacement policy allow sequences of instructions to be manufactured that will have a known layout in the uop cache. One group of “tigers” interfere with other “tigers” because they intentionally map to the same uop cache sets, while “tigers” and “zebra” are constructed to be mutually exclusive with one another.

As a result of determining the behavior of the uop cache, it is possible to construct sequences of code that can be used for covert communication through the uop cache (fun), or (worse) leverage measurements of the uop cache to determine when a data dependent branch has caused instructions for that dependent branch to be populated into the cache.

The latter case is most dangerous. It allows for (potential) circumvention of some of the Spectre-v1 (bounds check bypass) mitigations commonly adopted in software. Let’s remind ourselves how Spectre worked. It all comes down to memory being dramatically slower than compute, meaning that processors need to frequently wait for reads (loads) from memory to complete. Out-of-Order execution allows the backend to get around this in two ways. First, if a load is outstanding but some immediately subsequent code does not depend upon the result, then the subsequent code can be executed and the ordering restored later. Second, the processor can enter a mode of “speculation” in which it guesses the result of the load. Ignoring value prediction for this article, consider this instead:

char array[1024];
int array_size = 1024;

char secret[1024];
extern uint8_t victim_function(size_t i) {
    // bounds check:
    if (i >= 0 && i < array_size) {
        // misspeculation of this branch
        // bypasses the bounds check
        return array[i];
    }
    return -1;
}

Listing: Victim Method for the Variant-1 Attack (as given in “Listing 4” in the paper)

When the processor reaches the “if” branch above, which performs the bounds check on the attacker controlled “i” variable, it does not know whether to proceed until it has “resolved” the branch by performing the necessary load of array_size and the comparison. While this load is happening, the processor may nonetheless speculate ahead that the bounds check will succeed, especially if it has done so in the past (and thus the branch predictor has been “trained” to behave a certain way). In Spectre, the attacker causes secret leakage by first exceeding a bounds check such that “secret” which immediately follows “array” in memory becomes the target, and then finding some “gadget” in the code following the bounds check that performs a second memory access based upon the value of the first. This data-dependent second access leaves breadcrumbs in the L1 data cache of the processor.

In Spectre, the second access after the first is important because it causes the L1 data cache at a predictable location (based upon its organization) to become populated based upon whether or not the first (secret) memory had a certain value. Timing analysis of the data cache access latency at that second location can subsequently be used to determine whether it was populated or not. Through careful analysis of target programs, it is possible to find sufficient Spectre “gadgets” to leak secrets one bit at a time (or better).

Mitigating Spectre-v1 typically requires restraining speculation beyond sensitive branches (either identified by manual review, or in an automated fashion using tooling or compilers). One way in which this is performed is to use a form of serializing memory barrier known as a load fence (LFENCE) on x86. Inserting an LFENCE as a Spectre mitigation works because it seeks to prevent the backend of the machine from executing any further loads beyond the potentially dangerous load. However, it apparently doesn’t prevent the frontend of the machine from potentially fetching code (e.g. from a branch) beyond the bounds check.

In the paper, the researchers replace a secret-dependent second load with a secret-dependent jump or function call. The latter causes the processor to begin preparing to execute the instructions following the branch, ensuring they are present in the uop cache. Using the “tiger” and “zebra” analysis methods previously developed in the paper, the researchers are subsequently able to determine the presence or absence of uops and to reconstruct the secret. Finally (of course) they were able to find a few example “gadgets” in the Linux kernel source that could be theoretically exploited to leak data in this manner.

The researchers primarily focus on same address space / same privilege level attacks where a different context (e.g. managed code runtime, such as Java or JavaScript) might be interpreting untrusted code within a sandbox. Practical attacks of this form are likely to be fairly minimal, however, due to the extensive work done by the industry in the wake of Spectre (and its companions) to separate untrusted code into separate address spaces. It is, for example, no longer the case that web browsers are running untrusted JavaScript code within an interpreter that exists within the same address space as trusted browser code.

More worrying, however, is the potential for cross-domain attacks (one privilege level to another), or cross-SMT thread attacks. The researchers claim that they were able to perform user-to-kernel type attacks (and presumably this could be a kernel to hypervisor in place of user to kernel). They were not able to target Intel’s SGX (Software Guard Extensions, also known as enclaves) because entry into an enclave causes the processor’s instruction TLB (Translation Lookaside Buffer, a structure that caches memory translations between the “virtual” memory view used by programs, and the physical memory underneath) to be invalidated and this has the side-effect of invalidating the uop cache as a benefit.

But the cross-domain attack is a potential concern. If it turns out to be significant, then a logical fix would be to invalidate the uop cache when crossing this privilege boundary. We know that this can be done by invalidating the iTLB (and we know this is also a side effect of the original “Meltdown” Page Table Isolation or “PTI” solution as it causes the kernel and userspace to use entirely different address spaces – what we don’t know is whether all of the optimizations for this such as PCID invalidations also cause uop cache invalidation).

Overall, the paper is interesting reading. It’s far from the world-ending sensationalism implied by the “Defenseless” language on the Virginia site, and in the press pick up thus far. Quite the contrary really. The industry had a huge problem on its hands with Spectre, and as a direct consequence a great deal of effort was invested in separating privilege, isolating workloads, and using different contexts. There may be some cleanup needed in light of this latest paper, but there are mitigations available, albeit always at some performance cost.

by jonmasters at 01 May 2021, 22:47

01 May 2021

Marcin Juszkiewicz

What ‘a new computer’ is?

During last months I had several discussions about buying a new computer. Helped few friends with choosing setup etc. And several of them were surprised when I told that I never bought ‘a new computer’.

8-bit era

My first computer was Atari 65XE which my parents bought as a new computer. I learnt BASIC, tried few other programming languages (Forth anyone?) and played games. But it was not platform for long use.

Amiiiiga!

Two years later I earned money collecting berries in Swedish forests and bought my first computer — used Amiga 600. And two weeks later added 425MB hard disk inside. Nice improvement, still same 12” green monitor. More programming, less games. Demoscene watching started.

Time passed, I sold A600 and bought used Amiga 1200 instead. Same hdd and monitor (also same mouse as I liked old one more). Then some CPU accelerator card, then another, new ATA controller, new hdd, used cd-rom drive and used 14” VGA mono monitor.

AmigaOS was great operating system but platform was dying, hardware was expensive and slow, no new software.

Let move to PC

Year 2000 was moment when I decided to abandon AmigaOS and bought a PC. Which reused hdd, cd-rom and monitor. In some sense it was a new computer but still kept something from previous system. In next years some minor/major upgrades happened.

In 2006 I switched architecture of my desktop system. From x86 to x86-64. In as cheap as possible way. It was time when embedded Linux was starting to not be “just a hobby”.

Years passed, processors, mainboards, memory amount, storage, cases, expansion cards, monitors etc. were changing. But still there was no point of ‘ok, let me buy whole new computer’ as most of components were reused.

Do laptops count?

There were several laptops in meantime. One of them (Asus UL30A) was even brand new. But, contrary to other users, I use laptops only during travels — at home they usually sit connected to power and sometimes are used as headless build machine (as I tend to use different Linux distribution than my desktop there).

Arm?

And none of Arm systems I use count. I bought Sharp Zaurus SL-5500A (no longer own), Nokia N810 tablet and some small board computers like Wandboard (sold), Raspberry/Pi 3 (sold) or RockPro64. Android devices do not count either.

Summary

Who knows, maybe one day I will buy a new computer. Whole one — case, mainboard, processor, memory, storage, graphics. Or something NUC like.

Just there is nothing interesting so far to make such buy. Or I am not lazy enough to just buy whole workstation instead of building it on my own ;D

by Marcin Juszkiewicz at 01 May 2021, 16:08

22 April 2021

Marcin Juszkiewicz

Sometimes one tweet is enough

Two weeks ago I wrote on Twitter:

Is there some company with spare AArch64 CPU cycles?

Opendev (project behind OpenStack and some more) would make use of another aarch64 server offer.

Current one is iirc paid by @Arm, hosted by @equinixmetal and operated by @LinaroOrg.

Why I did that? Maybe frustration, maybe burnout. Hard to tell. But I did. Without targeting any Arm related company as I did not wanted to force anyone to do anything.

Response

A few hours later I got an email from Peter Pouliot from Ampere Computing. With information that they provided hardware to Oregon State University Open Source Lab (OSUOSL in short) and that we may get nodes there.

As I have no idea how exactly Opendev infrastructure works I added Kevin Zhao to the list. He is Linaro employee working on all instances of Linaro Developer Cloud and he maintained all AArch64 resources provided to Opendev.

Process

Kevin added Opendev infra admins: Clark Boylan and Ian Wienand. Peter added Lance Alberson from OSUOSL. I was just one of addresses in emails looking how things go.

And it went nice. If was pleasure to read how it goes. Two days, 8 emails, arrangements were made. Then changes to Opendev infrastructure configuration followed and week later ‘linaro-us’ was not the only provider of AArch64 nodes.

Result

Opendev has two providers of AArch64 nodes now:

  • linaro-us-regionone
  • osuosl-regionone

First one is paid by Arm Ltd, hosted at Equinix Metal (formerly Packet) and operated by Kevin Zhao from Linaro.

Second one runs on Ampere provided hardware and is operated by OSUOSL admins.

check-arm64’ pipeline on Opendev CI gets less clogged. And I hope that more and more projects will use it to test their code not only on x86-64 ;D

by Marcin Juszkiewicz at 22 April 2021, 06:35

16 April 2021

Gema Gomez

My quilting journey

It has been almost a year since I last posted on the blog. And what a year it has been. I have been busy learning a new craft and sewing. I have learnt how to sew straight seams (really straight seams) and I have learnt how to quilt (not very well yet, but finished is better than perfect).

Spending over a year not leaving the house has been quite an experience. We used to do holiday trips regularly or day trips to places to enjoy the local cuisine and amenities. This has not been posible due to the pandemic. Not only that, some loved ones are no longer with us. It has been a tough year. Protecting the loved ones meant not seeing them, and not allowing the rules to be relaxed. Last year I also happened to find myself out of a job, took a few months to recover from stress and really to figure out what I wanted to do next. In the midst of all this, I found focus on learning to quilt. Creating things that could then be gifted to loved ones being one of the many appeals.

Shopping has also changed during the pandemic. I used to love going out on a weekend and do some shopping. We have not visited a shop since February 2020. Groceries have all been bought online. All the fabric and tools I needed for the craft were also purchased online. All the advice and training I got from awesome crafty communities online. Found new friends online and had meetings as part of sew alongs. It has been quite a year, I cannot wait to meet all the new friends in person.

Here is the first quilt I made (and last if you consider all the other ones are still not finished, just pieced or in the process of being pieced)… some of them only planned. For fabric, I got a precut set from Jordan Fabrics, a Hoffman Metallic Sparkle & Fade Pre-Cut 12-Block Log Cabin Quilt Kit - Metallic Shadows. I chose a precut set because I was not really sure how to go about cutting fabric for a quilt, so I thought that’d be handy to start with. It came with its own pattern too, a very simple one. It took a while to arrive, all the way from the US, and the pieces looked gorgeous:

Pre cut pieces

After the piecing, I basted the quilt. This is the process of putting the backing, the batting (filler) and the quilt top together to be able to start quilting. I did baste it on the kitchen floor, using frog tape to secure the backing fabric to the floor (I saw this tip on a few online tutorials).

Basting

Once basted, the quilting can start. This is as fun as piecing or more. I decided to use a ruler because free motion quilting felt really unnatural and clunky at this stage.

Quilting

I did this quilt fully on my home sewing machine. The throat space on that machine is 8 inches. Maneuvering the quilt was a bit challenging but not a lot, since it was a smallish quilt. I had an issue with thread, it kept breaking when I was quilting the border. This was very annoying and it almost made me give up. It turns out the thread I was using was in bad condition, it would break much more easily than others I had from the same manufacturer and weight. So changing the thread helped a lot.

Once the quilt was finished, I did the binding with grey strips and shipped it to my mum, who I have not seen since 2019. This project had so many firsts for me. First quilt, first pin basted quilt (bigger than a sample size), first time with rulers, and first time binding.

Finished Close up

by Gema Gomez at 16 April 2021, 23:00

08 April 2021

Marcin Juszkiewicz

Five years @linaro.org

Five years ago I got an email from Kristine with “Linaro Assignee On-Boarding” title. Those were busy years.

OpenStack

According to stackalytics I did 1144 reviews so far:

OpenStack project Reviews
kolla 641
kolla-ansible 445
releases 14
nova 11
loci 9
requirements 7
devstack 4
tripleo-ci 3
pbr 2
magnum 2

Those were different ones — from simple fixes to new features. Sometimes it was hard to convince projects that my idea makes sense. There were patches with over 50 revisions. Some needed split to smaller ones, reordering etc.

And at the end AArch64 is just another architecture in OpenStack family. Linaro Developer Cloud managed to pass official certifications etc.

Python

Countless projects. From suggesting ‘can you publish aarch64 wheel’ to adding code to make it happen. Or work on getting manylinux2014 working properly.

Linaro CI

We have Jenkins setup at Linaro. With several build machines attached to it. And countless amount of jobs running there. I maintain several ones and my job is taking care and adding new ones when needed.

Servers

Due to my work at Linaro Enterprise Group (renamed later at Linaro Datacenter & Cloud Group) I dealt with many AArch64 server systems. From HPe Moonshot to Huawei D06. With Qualcomm Falkor in meantime. Used CentOS 7/8 on them. Debian 9/10/11. Added needed config entries to Debian kernel to get them working out of the box (iirc excluding M400 cartridges no one maintained any more).

Conferences

I did “OpenStack on AArch64” talk at three conferences in a row. First Linaro Connect LAS16 (as group talk), then on Pingwinaria in Poland and finally in Kiev, Ukraine. Same slides, translated English -> Polish -> English and updated each time.

Since then I do not give lectures at conferences any more. Prefer to attend someone’s talks. Or spend time on hallway discussions. Linaro Connect events were good place for them (and hope they will be in future too).

At the end of each there was listing of every person who worked at Linaro for 5 or 10 years. Due to pandemic I missed that part. But hope for that memorial glass ;D

by Marcin Juszkiewicz at 08 April 2021, 12:08

01 April 2021

Marcin Juszkiewicz

Let’s play with some new stuff

Two days ago Arm announced Arm v9 architecture. Finally we can discuss it in open instead saying “I prefer not to talk about this” (because NDA etc.).

New things

There are several new things in v9 version. SVE2, Spectre/Meltdown like mitigations, memory tagging, realms… And some of them are present in v8 already (mitigations are v8.5 IIRC).

Hardware

But how it goes work in hardware? There was no new cpu core announcements so we need to wait for Arm Neoverse N2 derived designs (as it will be v9).

As usual mobile phones and tablets will get it first. Then probably Apple will put it into newer Macbooks. Decade later servers and workstations.

And there are always those machines in labs. Packed with NDAs, access queues etc…

$ uname -a
Linux bach 5.11.0-73 #1 SMP Fri Mar 12 11:34:12 UTC 2021 aarch64 GNU/Linux
$ head -n8 /proc/cpuinfo
processor       : 0
BogoMIPS        : 50.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp rme
                  asimdhp cpuid sb16 fp16 asimdrdm uafp jscvt fcma sb128 lrcpc
                  dcpop sha3 sm3 sm4 asimddp afd sha512 sve asimdfhm dit ilrcpc
                  rng flagm tme ssbs paca pacg sb dcpodp ac97 flagm2 frint mte
                  sve2 bf16 bti
CPU implementer : 0x4a
CPU architecture: 9
CPU variant     : 0x0
CPU part        : 0xf02
CPU revision    : 0

And sorry, no benchmarks allowed on not-mass-market hardware. The good part? It is still AArch64 so no recompilation required. Some software packages gets first set of SVE2 improvements soon.

That was only a joke

The truth is that even labs do not have such stuff. While most of features listed in output of /proc/cpuinfo exists in newer Arm v8 cores some of them were added there for fun:

  • afd (April Fools’ Day)
  • sb16 and sb128 (SoundBlaster 16/128)
  • ac97 (yet another sound device)
  • uafp (Use-After-Free Protection)

Thanks

I would like to thank a few people.

Arnd Bergmann pointed out that two fields related to CPU are wrong:

  • implementer should be ASCII code (so I changed from 0xe3 to 0x4a (‘J’ as in Joke))
  • part field is just 12 bits (so changed from 0x1f02 to - 0xf02)

Mark Brown spotted duplicated ‘rng’ feature. And Ryan Houdek who wrote about ‘tme’ feature.

by Marcin Juszkiewicz at 01 April 2021, 07:48

27 March 2021

Neil Williams

Free and Open

A long time, no blog entries. What's prompted a new missive?

Two things:

  • my own hankering to get back into free software after too many years faced with daily struggles against proprietary software
  • the upcoming GR in Debian.

All the time I've worked with software and computers, the lack of diversity in the contributors has irked me. I started my career in pharmaceutical sciences where the mix of people, at university, would appear to be genuinely diverse. It was in the workplace that the problems started, especially in the retail portion of community pharmacy with a particular gender imbalance between management and counter staff. Then I started to pick up programming. I gravitated to free software for the simple reason that I could tinker with the source code. To do a career change and learn how to program with a background in a completely alien branch of science, the only realistic route into the field is to have access to the raw material - source code. Without that, I would have been seeking a sponsor, in much the same way as other branches of science need grants or sponsors to purchase the equipment and facilities to get a foot in the door. All it took from me was some time, a willingness to learn and a means to work alongside those already involved. That's it. No complex titration equipment, flasks, centrifuges, pipettes, spectrographs, or petri dishes. It's hard to do pharmaceutical science from home. Software is so much more accessible - but only if the software itself is free.

Software freedom removes the main barrier to entry. Software freedom enables the removal of other barriers too. Once the software is free, it becomes obvious that the compiler (indeed the entire toolchain) needs to be free, to turn the software into something other people can use. The same applies to interpreters, editors, kernels and the entire stack. I was in a different branch of science whilst all that was being created and I am very glad that Debian Woody was available as a free cover disc on a software magazine just at the time I was looking to pick up software programming.

That should be that. The only next step to be good enough to write free software was the "means to work alongside those already involved". That, it turns out, is much more than just a machine running Debian. It's more than just having an ISP account and working email (not commonplace in 2003). It's working alongside the people - all the people. It was my first real exposure to the toxicity of some parts of many scientific and technical arenas. Where was the diversity? OK, maybe it was just early days and the other barriers (like an account with an ISP) were prohibitive for many parts of the world outside Europe & USA in 2003, so there were few people from other countries but the software world was massively dominated by the white Caucasian male. I'd been insulated by my degree course, and to a large extent by my university which also had courses which already had much more diverse intakes - optics and pharmacy, business and human resources. Relocating from the insular world of a little town in Wales to Birmingham was also key. Maybe things would improve as the technical barriers to internet connectivity were lowered.

Sadly, no. The echo chamber of white Caucasian input has become more and more diluted as other countries have built the infrastructure to get the populace online. Debian has helped in this area, principally via DebConf. Yet only the ethnicity seemed to change, not the diversity. Recently, more is being done to at least make Debian more welcoming to those who are brave enough to increase the mix. Progress is much slower than the gains seen in the ethnicity mix, probably because that was a benefit of a technological change, not a societal change.

The attitudes so prevalent in the late 20th century are becoming less prevalent amongst, and increasingly abhorrent to, the next generation of potential community members. Diversity must come or the pool of contributors will shrink to nil. Community members who cling to these attitudes are already dinosaurs and increasingly unwelcome. This is a necessary step to retain access to new contributors as existing contributors age. To be able to increase the number of contributors, the community cannot afford to be dragged backwards by anyone, no matter how important or (previously) respected.

Debian, or even free software overall, cannot change all the problems with diversity in STEM but we must not perpetuate the problems either. Those people involved in free software need to be open to change, to input from all portions of society and welcoming. Puerile jokes and disrespectful attitudes must be a thing of the past. The technical contributions do not excuse behaviours that act to prevent new people joining the community. Debian is getting older, the community and the people. The presence of spouses at Debian social events does not fix the problem of the lack of diversity at Debian technical events. As contributors age, Debian must welcome new, younger, people to continue the work. All the technical contributions from the people during the 20th century will not sustain Debian in the 21st century. Bit rot affects us all. If the people who provided those contributions are not encouraging a more diverse mix to sustain free software into the future then all their contributions will be for nought and free software will die.

So it comes to the FSF and RMS. I did hear Richard speak at an event in Bristol many years ago. I haven't personally witnessed the behavioural patterns that have been described by others but my memories of that event only add to the reality of those accounts. No attempts to be inclusive, jokes that focused on division and perpetuating the white male echo chamber.

I'm not perfect, I struggle with some of this at times. Personally, I find the FSFE and Red Hat statements much more in line with my feelings on the matter than the open letter which is the basis of the GR. I deplore the preliminary FSF statement on governance as closed-minded, opaque and archaic. It only adds to my concerns that the FSF is not fit for the 21st century. The open letter has the advantage that it is a common text which has the backing of many in the community, individuals and groups, who are fit for purpose and whom I have respected for a long time.

Free software must be open to contributions from all who are technically capable of doing the technical work required. Free software must equally require respect for all contributors and technical excellence is not an excuse for disrespect. Diversity is the life blood of all social groups and without social cohesion, technical contributions alone do not support a community. People can change and apologies are welcome when accompanied by modified behaviour.

The issues causing a lack of diversity in Debian are complex and mostly reflective of a wider problem with STEM. Debian can only survive by working to change those parts of the wider problem which come within our influence. Influence is key here, this is a soft issue, replete with unreliable emotions, unhelpful polemic and complexity. Reputations are hard won and easily blemished. The problems are all about how Debian looks to those who are thinking about where to focus in the years to come.

It looks like the FSFE should be the ones to take the baton from the FSF - unless the FSF can adopt a truly inclusive and open governance.

My problem is not with RMS himself, what he has or has not done, which apologies are deemed sincere and whether behaviour has changed. He is one man and his contributions can be respected even as his behaviour is criticised. My problem is with the FSF for behaving in a closed, opaque and divisive manner and then making a governance statement that makes things worse, whilst purporting to be transparent. Institutions lay the foundations for the future of the community and must be expected to hold individuals to account. Free and open have been contentious concepts for the FSF, with all the arguments about Open Source which is not Free Software. It is clear that the FSF do understand the implications of freedom and openness. It is absurd to then adopt a closed and archaic governance. A valid governance model for the FSF would never have allowed RMS back onto the board, instead the FSF should be the primary institution to hold him, and others like him, to account for their actions. The FSF needs to be front and centre in promoting diversity and openness. The FSF could learn from the FSFE.

The calls for the FSF to adopt a more diverse, inclusive, board membership are not new. I can only echo the Red Hat statement:

in order to regain the confidence of the broader free software community, the FSF should make fundamental and lasting changes to its governance.

...

[There is] no reason to believe that the most recent FSF board statement signals any meaningful commitment to positive change.

And the FSFE statement:

The goal of the software freedom movement is to empower all people to control technology and thereby create a better society for everyone. Free Software is meant to serve everyone regardless of their age, ability or disability, gender identity, sex, ethnicity, nationality, religion or sexual orientation. This requires an inclusive and diverse environment that welcomes all contributors equally.

The FSF has not demonstrated the behaviour I expect from a free software institution and I cannot respect an institution which proclaims a mantra of freedom and openness that is not reflected in the governance of that institution.

The preliminary statement by the FSF board is abhorrent. The FSF must now take up the offers from those institutions within the community who retain the respect of that community. I'm only one voice, but I would implore the FSF to make substantive, positive and permanent change to the governance, practices and future of the FSF or face irrelevance.

by Neil Williams at 27 March 2021, 13:26

25 March 2021

Marcin Juszkiewicz

From a diary of AArch64 porter — manylinux2014

Python wheels… Everyone loves them, many people curse when they are not available for their setup. I am in both groups every time I have to do something more complex with Python.

So today I will show how to build Python wheel in quick way.

What is manylinux?

Linux world has a lot of distributions. Even more when you add their releases into the mix. And they ship different versions of Python. At same time we have Pypi which works as repository of ready to use Python packages.

So manylinux idea was created to have minimal requirements for building Python packages. To make sure that you can install it on any distribution.

So far there were several versions built:

name base distribution PEP
manylinux1 CentOS 5 PEP 513
manylinux2010 CentOS 6 PEP 571
manylinux2014 CentOS 7 PEP 599
manylinux_2_24 Debian 9 ‘stretch’ PEP 600

As you see old releases are used to make sure that resulting binaries work on any newer distribution. manylinux2014 added non-x86 architectures (aarch64, ppc64le, s390x).

Each image contains several versions of Python binaries ready to use in /opt/python/ directory.

Manylinux images are distributed as container images on pypa repository on quay.io. Run under Docker, Kubernetes, Podman etc.

Source code is available in ‘manylinux’ repository on GitHub.

Let’s use!

My work requires me to build TensorFlow 1.5.15 version, which depends on NumPy 1.18.* version. None of them are available as wheels for AArch64 architecture.

So let me run container and install NumPy dependencies:

$ docker run -it -u root -v $PWD:/tmp/numpy quay.io/pypa/manylinux2014_aarch64
[root@fa339493a417 /]# cd /tmp/numpy/
[root@fa339493a417 /tmp/numpy/]# yum install -y blas-devel lapack-devel
[root@fa339493a417 /tmp/numpy/]# 

Image has several versions of Python installed and I want to build NumPy 1.18.5 for each of them so my build script is easy:

for py in /opt/python/cp3[6789]*
do
    pyver=`basename $py`
    $py/bin/python -mvenv $pyver
    source $pyver/bin/activate
    pip wheel numpy==1.18.5
    deactivate
done

Result is simple — I got set of wheel files. One per Python version. But it is not the end of work as NumPy libraries depend on blas/lapack we installed into system.

add libraries to the wheel

There is a tool we need to run: “auditwheel”. What it does is inspecting wheel file, all library symbols used, external libraries etc. Then it bundles required libraries into wheel file:

INFO:auditwheel.main_repair:Repairing numpy-1.18.5-cp39-cp39-linux_aarch64.whl
INFO:auditwheel.wheeltools:Previous filename tags: linux_aarch64
INFO:auditwheel.wheeltools:New filename tags: manylinux2014_aarch64
INFO:auditwheel.wheeltools:Previous WHEEL info tags: cp39-cp39-linux_aarch64
INFO:auditwheel.wheeltools:New WHEEL info tags: cp39-cp39-manylinux2014_aarch64
INFO:auditwheel.main_repair:
Fixed-up wheel written to /root/wheelhouse/numpy-1.18.5-cp39-cp39-manylinux2014_aarch64.whl

File size changed from 13 467 772 to 16 806 338 bytes and resulting wheel can be installed on any distribution.

Let summarise

Manylinux is great tool to provide Python packages. It is easy to use on developer’s machine or on CI. And makes life of Python users much easier.

by Marcin Juszkiewicz at 25 March 2021, 13:11

14 March 2021

Marcin Juszkiewicz

U-Boot and generic distro boot

Small board computers (SBC) usually come with U-Boot as firmware. There could be some more components like Arm Trusted Firmware, OPTEE etc but what user interact with is the U-Boot itself.

Since 2016 there is the CONFIG_DISTRO_DEFAULTS option in U-Boot configuration. It selects defaults suitable for booting general purpose Linux distributions. Thanks to it board is able to boot most of OS installers out of the box without any user interaction.

How?

How does it know how to do that? There are several scripts and variables involved. Run “printenv” command in U-Boot shell and there you should see some of them named like “boot_*, bootcmd_* scan_dev_for_*”.

In my example I would use environment from RockPro64 running U-Boot 2021.01 version.

I will prettify all scripts for readability. Script contents may be expanded — in such case I will give name as a comment and then it’s content.

Let’s boot

First variable used by U-Boot is “bootcmd”. It reads it to know how to boot operating system on the board.

In out case this variable has “run distro_bootcmd” in it. So what is there on RockPro64 SBC:

setenv nvme_need_init
for target in ${boot_targets}
do 
    run bootcmd_${target}
done

It says that on-board NVME needs some initialization and then goes through set of scripts using order from “boot_targets” variable. On RockPro64 this variable sets “mmc0 mmc1 nvme0 usb0 pxe dhcp sf0” order which means:

  • eMMC
  • MicroSD
  • NVME
  • USB storage
  • PXE
  • DHCP
  • SPI flash

Both eMMC and MicroSD look similar: ‘devnum=X; run mmc_boot’ — set MMC number and then try to boot by running ‘mmc_boot’ script:

if mmc dev ${devnum}; then 
    devtype=mmc; 
    run scan_dev_for_boot_part; 
fi

NVME one initialize PCIe subsystem (via “boot_pci_enum”), then scans for NVME devices (via “nvme_init”) and do the similar stuff (here with expanded scripts):

# boot_pci_enum
pci enum

# nvme_init
if ${nvme_need_init}; then 
    setenv nvme_need_init false;
    nvme scan;
fi

if nvme dev ${devnum}; then 
    devtype=nvme; 
    run scan_dev_for_boot_part; 
fi

USB booting goes with “usb_boot”:

usb start;
if usb dev ${devnum}; then 
    devtype=usb; 
    run scan_dev_for_boot_part;
fi

PXE network boot? Initialize USB, scan PCI, get network configuration, do PXE boot:

# boot_net_usb_start
usb start

# boot_pci_enum
pci enum

dhcp; 
if pxe get; then 
    pxe boot; 
fi

DHCP method feels like last resort one (do not ask me for meaning of all those variables):

# boot_net_usb_start
usb start

# boot_pci_enum
pci enum

if dhcp ${scriptaddr} ${boot_script_dhcp}; then 
    source ${scriptaddr}; 
fi;

setenv efi_fdtfile ${fdtfile}; 
setenv efi_old_vci ${bootp_vci};
setenv efi_old_arch ${bootp_arch};
setenv bootp_vci PXEClient:Arch:00011:UNDI:003000;
setenv bootp_arch 0xb;

if dhcp ${kernel_addr_r}; then 
    tftpboot ${fdt_addr_r} dtb/${efi_fdtfile};

    if fdt addr ${fdt_addr_r}; then 
        bootefi ${kernel_addr_r} ${fdt_addr_r}; 
    else 
        bootefi ${kernel_addr_r} ${fdtcontroladdr};
    fi;
fi;

setenv bootp_vci ${efi_old_vci};
setenv bootp_arch ${efi_old_arch};
setenv efi_fdtfile;
setenv efi_old_arch;
setenv efi_old_vci;

And last method is SPI flash:

busnum=0

if sf probe ${busnum}; then
    devtype=sf;

    # run scan_sf_for_scripts; 
    ${devtype} read ${scriptaddr} ${script_offset_f} ${script_size_f}; 
    source ${scriptaddr}; 
    echo SCRIPT FAILED: continuing...
fi

Search for boot partition

Note how block devices end with one script: “scan_dev_for_boot_part”. What it does is quite simple:

part list ${devtype} ${devnum} -bootable devplist; 
env exists devplist || setenv devplist 1; 

for distro_bootpart in ${devplist}; do 
    if fstype ${devtype} ${devnum}:${distro_bootpart} bootfstype; then 
        run scan_dev_for_boot; 
    fi; 
done; 
setenv devplist

We know type and number of boot device from previous step so now we check for bootable partitions. Which means EFI System Partition for GPT disks and partitions marked as bootable in case of MBR. If none are present then first one is assumed to be bootable one.

Search for distribution boot information

Once we found boot partitions it is time to search for boot stuff with “scan_dev_for_boot” script:

echo Scanning ${devtype} ${devnum}:${distro_bootpart}...;
for prefix in ${boot_prefixes}; do 
    run scan_dev_for_extlinux; 
    run scan_dev_for_scripts; 
done;

run scan_dev_for_efi;

Old style OS configuration

First U-Boot checks for “extlinux/extlinux.conf” file, then go for old style “boot.scr” (in uimg and clear text formats). Both of them are checked in / and /boot/ directories of checked partition (those names are in “boot_prefixes” variable).

Let us look at it:

# scan_dev_for_extlinux
if test -e ${devtype} ${devnum}:${distro_bootpart} ${prefix}${boot_syslinux_conf};then 
    echo Found ${prefix}${boot_syslinux_conf}; 

    # run boot_extlinux; 
    sysboot ${devtype} ${devnum}:${distro_bootpart} any ${scriptaddr} ${prefix}${boot_syslinux_conf}

    echo SCRIPT FAILED: continuing...; 
fi

# scan_dev_for_scripts
for script in ${boot_scripts}; do 
    if test -e ${devtype} ${devnum}:${distro_bootpart} ${prefix}${script}; then 
        echo Found U-Boot script ${prefix}${script}; 

        # run boot_a_script; 
        load ${devtype} ${devnum}:${distro_bootpart} ${scriptaddr} ${prefix}${script}; 
        source ${scriptaddr}

        echo SCRIPT FAILED: continuing...; 
    fi; 
done

EFI compliant OS

And finally U-Boot checks for EFI style BootOrder variables and generic OS loader path:

# scan_dev_for_efi
setenv efi_fdtfile ${fdtfile};
for prefix in ${efi_dtb_prefixes}; do
    if test -e ${devtype} ${devnum}:${distro_bootpart} ${prefix}${efi_fdtfile}; then 
        # run load_efi_dtb; 
        load ${devtype} ${devnum}:${distro_bootpart} ${fdt_addr_r} ${prefix}${efi_fdtfile}
    fi;
done;

# run boot_efi_bootmgr;
if fdt addr ${fdt_addr_r}; then 
    bootefi bootmgr ${fdt_addr_r};
else 
    bootefi bootmgr;
fi

if test -e ${devtype} ${devnum}:${distro_bootpart} efi/boot/bootaa64.efi; then
    echo Found EFI removable media binary efi/boot/bootaa64.efi; 

    # run boot_efi_binary; 
    load ${devtype} ${devnum}:${distro_bootpart} ${kernel_addr_r} efi/boot/bootaa64.efi; 
    if fdt addr ${fdt_addr_r}; then 
        bootefi ${kernel_addr_r} ${fdt_addr_r};
    else 
        bootefi ${kernel_addr_r} ${fdtcontroladdr};
    fi

    echo EFI LOAD FAILED: continuing...;
fi; 
setenv efi_fdtfile

Booted

At this moment board should be in either OS or in OS loader (being EFI binary).

Final words

All that work on searching for boot media, boot scripts, boot configuration files, OS loaders, EFI BootOrder entries etc is done without any user interaction. Every bootable media is checked and tried.

If I would add SATA controller support into U-Boot binary then all disks connected to such would also be checked. Without any code/environment changes from my side.

So if your SBC has some weird setup then consider moving to distro generic one. Boot fresh mainline U-Boot, store copy of your existing environment (“printenv” shows it) and then reset to generic one with “env default -a” command. Probably would need to set MAC adresses for network interfaces.

by Marcin Juszkiewicz at 14 March 2021, 09:14

07 March 2021

Marcin Juszkiewicz

Time to change something on the blog

I had some ideas for improvements on website. And finally found some time to implement them.

Series of posts

One of changes was implementing ‘series of posts’ to make it easier to find posts on one topic. For now there are two of them:

  • Standards in Arm space
  • From a diary of AArch64 porter

Each post in series has a list at the top. I may group some other posts into additional series.

Mobile devices fixes

From time to time I had email from Google bots that some things on my website need improvements. Most of it were about mobile devices. So I went through Lighthouse audits and did some changes:

  • top menu is one entry per line
  • clickable lists have more padding between entries
  • removed ‘popular tags’ group from sidebar as it was not used
  • display more entries in ‘recent posts’ sidebar section

About me section

Also added ‘About me’ section to the sidebar. I often give links to my blog posts in several instant messaging channels (IRC, Discord, Telegram) and when people realize that I wrote them there is strange moment:

<hrw> irc_user: https://marcin.juszkiewicz.com.pl/2020/06/17/ebbr-on-rockpro64/

<irc_user> hrw: yes, I know that article, that is why I want to try it, I’ll get the RockPro64 today if all goes well! :)

<irc_user> I wonder whether the images from uboot-images-armv8-2020.10-2.fc33.noarch.rpm also have EBBR support for rockpro64 though, without needing to download a random binary from Marcin’s website :)

<hrw> iirc Fedora images are not built for writing into SPI flash

<hrw> irc_user: and you reminded me that I need to add text block to website

<irc_user> hrw: ah, you are Marcin! :-)

<irc_user> now I feel stupid

<irc_user> excellent blog you have! thanks so much for that, I learned a lot!

<hrw> thx

Now info about my nick name is right at the top of page (unless on mobile).

Useful tables

I also added a list of tables from my side projects:

  • BSA and SBSA checklist
  • Linux system calls for all architectures

I know that they have some users but now both are more visible.

by Marcin Juszkiewicz at 07 March 2021, 14:42

06 March 2021

Bin Chen

Architecting on AWS Course Summary

The Architecting on AWS course has 13 modules (as marked as mx below). Each module builds on previous one, either adding functionality or improving the architecture characteristics laid out by the 5 pillar well-architected framework.

Below is the summary of each modules, and notice how they build on progressively. As an exercise, following on the text description, you can draw out an "well-architected" architecture diagram for an typical 3 tie application. The corresponding components and concepts (show as components/concept below ) involved are fundamentals for building IT infrastructure on AWS services. Mastering them is critical for building a "best-practices-followed" cloud solution, or passing your AWS exam, for that matter.

  • use s3 to host static website
  • create an ec2 and mount efs (m3 add computer layer)
  • add rds primary and allow ec2 connection (m4 adding database layer)
  • segregate the 3 tiers into different subsets, use internet gateway for external ingress and use nat gateway for egress to public internet; use security group for instance firewall set up and network access list for subnet network control (m5)
  • use elb to load balance your instances spread accessed multiple AZ (for database you have to use primary and secondary and sync-up) to create HA structure (m6, it also has vpc peering and vpn) (m6)
  • put route53/dns in front of elb for multi region HA (m6)
  • use iam control resource access inside of infrastructure as well as external access (m7 )
  • auto scale, through monitoring, the ec2 app servers instances for elastic and fault tolerant/self healing (m8)
  • define all those stuff above using iac for automation and repeatability (m9)
  • use cache and cloudfront/cdn to improve performance (m10)
  • decoupling web tie and app tie using (another) load balancer (in addition to the elb putting in front of the web servers) (m11)
  • move to IaaS (ec2 based) to microservice (CaaS) and lambda (FaaS) (m12)
  • improve reliability through disaster planning and recovery (m13).

by Bin Chen at 06 March 2021, 22:25

10 December 2020

Ard Biesheuvel

AArch64 option ROMs for AMD GPUs

Usually, on AArch64 systems with proprietary UEFI firmware, a HDMI/DP display does not light up until the point where the OS driver takes control. This makes it tedious to access the UEFI configuration menus or the GRUB bootloader menu, since it either requires connecting a VGA monitor to the BMC, or finding it on the network and connecting to it.

Also, OS drivers for GPUs are all developed and tested under the assumption that the option ROM driver has executed when the OS takes over. This means any errata handling or OEM tweaks may not take effect as they would otherwise.

For AMD cards, it is possible to address this problem by replacing the x86 EFI driver on the card with an AArch64 one (or adding one if only a legacy x86 driver was there in the first place)

  1. Build flashrom from the ati branch at https://github.com/ardbiesheuvel/flashrom
  2. Check out edk2 (https://github.com/tianocore/edk2) and build the base tools (make -C BaseTools/)
  3. Dump the option ROM (as root)

    # ./flashrom -p ati_spi -r vbios.rom

    (Note that the ati_spi driver is currently hardcoded to support PCI domain/segment #0 only. )

  4. List the table of contents
    $ edk2/BaseTools/Source/C/bin/EfiRom -d vbios.rom 
    Image 1 -- Offset 0x0
      ROM header contents
        Signature              0xAA55
        PCIR offset            0x0244
        Signature               PCIR
        Vendor ID               0x1002
        Device ID               0x67DF
        Length                  0x0018
        Revision                0x0000
        DeviceListOffset        0x00
        Class Code              0x030000
        Image size              0xEA00
        Code revision:          0x0F32
        MaxRuntimeImageLength   0x00
        ConfigUtilityCodeHeaderOffset 0x4D41
        DMTFCLPEntryPointOffset 0x2044
        Indicator               0x00
        Code type               0x00
    Image 2 -- Offset 0xEA00
      ROM header contents
        Signature              0xAA55
        PCIR offset            0x001C
        Signature               PCIR
        Vendor ID               0x1002
        Device ID               0x67DF
        Length                  0x0018
        Revision                0x0000
        DeviceListOffset        0x00
        Class Code              0x030000
        Image size              0xE600
        Code revision:          0x0000
        MaxRuntimeImageLength   0x00
        ConfigUtilityCodeHeaderOffset 0x4F47
        DMTFCLPEntryPointOffset 0x2050
        Indicator               0x80   (last image)
        Code type               0x03   (EFI image)
      EFI ROM header contents
        EFI Signature          0x0EF1
        Compression Type       0x0001 (compressed)
        Machine type           0x8664 (X64)
        Subsystem              0x000B (EFI boot service driver)
        EFI image offset       0x0058 (@0xEA58)
    
    

    The first image contains the ATOM BIOS code, and should be kept – it is loaded by the OS driver when it probes the card. The second image is the existing x86 EFI driver.

  5. Create a compressed option ROM image from the AMD GOP driver for arm64 (https://www.amd.com/en/support/kb/release-notes/rn-aar), using the vendor and device IDs taken from the output above.
    $ edk2/BaseTools/Source/C/bin/EfiRom -f 0x1002 -i 0x67DF -l 0x030000 -ec Arm64Gop_1_68.efi

    This will create a file called Arm64Gop_1_68.rom in the current directory.

  6. Copy the ROM image to a new file, and store the old one for safe keeping.

  7. If the ROM image has sufficient space for an additional driver and you want to keep the x86 EFI driver, or if it does not have a EFI driver in the first place, you will need to use a hex editor to find the 0x80 ‘indicator’ byte at offset 22 of the last header that starts with PCIR, and change it to 0x0.

  8. Insert the AMD GOP option ROM image into the copy of the option ROM

    $ dd if=Arm64Gop_1_68.rom of=vbios_new.rom conv=notrunc seek=<byte offset>

    where byte offset equals offset + size of the preceding image. Note that this must be a multiple of 512. For the image above, this would be 0xea00 + 0xe600, assuming we are keeping the x86 driver (or 0xea00 if not).

  9. Dump the new image contents to double check that everything looks as expected.
    $ edk2/BaseTools/Source/C/bin/EfiRom -d vbios_new.rom 
    Image 1 -- Offset 0x0
      ROM header contents
        Signature              0xAA55
        PCIR offset            0x0244
        Signature               PCIR
        Vendor ID               0x1002
        Device ID               0x67DF
        Length                  0x0018
        Revision                0x0000
        DeviceListOffset        0x00
        Class Code              0x030000
        Image size              0xEA00
        Code revision:          0x0F32
        MaxRuntimeImageLength   0x00
        ConfigUtilityCodeHeaderOffset 0x4D41
        DMTFCLPEntryPointOffset 0x2044
        Indicator               0x00
        Code type               0x00
    Image 2 -- Offset 0xEA00
      ROM header contents
        Signature              0xAA55
        PCIR offset            0x001C
        Signature               PCIR
        Vendor ID               0x1002
        Device ID               0x67DF
        Length                  0x0018
        Revision                0x0000
        DeviceListOffset        0x00
        Class Code              0x030000
        Image size              0xE600
        Code revision:          0x0000
        MaxRuntimeImageLength   0x00
        ConfigUtilityCodeHeaderOffset 0x4F47
        DMTFCLPEntryPointOffset 0x2050
        Indicator               0x00
        Code type               0x03   (EFI image)
      EFI ROM header contents
        EFI Signature          0x0EF1
        Compression Type       0x0001 (compressed)
        Machine type           0x8664 (X64)
        Subsystem              0x000B (EFI boot service driver)
        EFI image offset       0x0058 (@0xEA58)
    Image 3 -- Offset 0x1D000
      ROM header contents
        Signature              0xAA55
        PCIR offset            0x001C
        Signature               PCIR
        Vendor ID               0x1002
        Device ID               0x67DF
        Length                  0x001C
        Revision                0x0003
        DeviceListOffset        0x00
        Class Code              0x030000
        Image size              0xFA00
        Code revision:          0x0000
        MaxRuntimeImageLength   0x00
        ConfigUtilityCodeHeaderOffset 0x00
        DMTFCLPEntryPointOffset 0x00
        Indicator               0x80   (last image)
        Code type               0x03   (EFI image)
      EFI ROM header contents
        EFI Signature          0x0EF1
        Compression Type       0x0001 (compressed)
        Machine type           0xAA64 (AA64)
        Subsystem              0x000B (EFI boot service driver)
        EFI image offset       0x0038 (@0x1D038)
    
    

  10. Use flashrom with the -w switch to write the new image back to the device.
  11. In the UEFI shell, the ‘drivers’ command will list all the UEFI drivers that have been loaded. E.g.,

    D1 00014400 D - -  1  - AMD GOP ARM64 Release Driver, Rev. Offset(0x1D000,...

    where the ‘D’ in the third column signifies that it has attached to the device.

by ardbiesheuvel at 10 December 2020, 11:38

28 November 2020

Mark Brown

Book club: Rust after the honeymoon

Earlier this month Daniel, Lars and myself got together to discuss Bryan Cantrill’s article Rust after the honeymoon. This is an overview of what keeps him enjoying working with Rust after having used it for an extended period of time for low level systems work at Oxide, we were particularly interested to read a perspective from someone who was both very experienced in general and had been working with the language for a while. While I have no experience with Rust both Lars and Daniel have been using it for a while and greatly enjoy it.

One of the first areas we discussed was data bearing enums – these have been very important to Bryan. In keeping with a pattern we all noted these take a construct that’s relatively commonly implemented by hand in C (or skipped as too much effort, as Lars found) and provides direct support in the language for it. For both Daniel and Lars this has been key to their enjoyment of Rust, it makes things that are good practice or common idioms in C and C++ into first class language features which makes them more robust and allows them to fade into the background in a way they can’t when done by hand.

Daniel was also surprised by some omissions, some small such as the ? operator but others much more substantial – the standout one being editions. These aim to address the problems seen with version transitions in other languages like Python, allowing individual parts of a Rust program to adopt potentially incompatible language features while remaining interoperability with older editions of the language rather than requiring the entire program to be upgraded en masse. This helps Rust move forwards with less need to maintain strict source level compatibility, allowing much more rapid evolution and helping deal with any issues that are found. Lars expressed the results of this very clearly, saying that while lots of languages offer a 20%/80% solution which does very well in specific problem domains but has issues for some applications Rust is much more able to move towards a much more general applicability by addressing problems and omissions as they are understood.

This distracted us a bit from the actual content of the article and we had an interesting discussion of the issues with handling OS differences in filenames portably. Rather than mapping filenames onto a standard type within the language and then have to map back out into whatever representation the system actually uses Rust has an explicit type for filenames which must be explicitly converted on those occasions when it’s required, meaning that a lot of file handling never needs to worry about anything except the OS native format and doesn’t run into surprises. This is in keeping with Rust’s general approach to interfacing with things that can’t be represented in its abstractions, rather than hide things it keeps track of where things that might break the assumptions it makes are and requires the programmer to acknowledge and handle them explicitly. Both Lars and Daniel said that this made them feel a lot more confident in the code that they were writing and that they had a good handle on where complexity might lie, Lars noted that Rust is the first languages he’s felt comfortable writing multi threaded code in.

We all agreed that the effect here was more about having idioms which tend to be robust and both encourage writing things well and gives readers tools to help know where particular attention is required – no tooling can avoid problems entirely. This was definitely an interesting discussion for me with my limited familiarity with Rust, hopefully Daniel and Lars also got a lot out of it!

by broonie at 28 November 2020, 15:58

11 October 2020

Mark Brown

Book club: JSON Web Tokens

This month for our book club Daniel, Lars, Vince and I read Hardcoded secrets, unverified tokens, and other common JWT mistakes which wasn’t quite what we’d thought when it was picked. We had been expecting an analysis of JSON web tokens themselves as several us had been working in the area and had noticed various talk about problems with the standard but instead the article is more a discussion of the use of semgrep to find and fix common issues, using issues with JWT as examples.

We therefore started off with a bit of a discussion of JWT, concluding that the underlying specification was basically fine given the problem to be solved but that as with any security related technology there were plenty of potential pitfalls in implementation and that sadly many of the libraries implementing the specification make it far too easy to make mistakes such as those covered by the article through their interface design and defaults. For example interfaces that allow interchangable use of public keys and shared keys are error prone, as is is making it easy to access unauthenticated data from tokens without clearly flagging that it is unauthenticated. We agreed that the wide range of JWT implementations available and successfully interoperating with each other is a sign that JWT is getting something right in providing a specification that is clear and implementable.

Moving on to semgrep we were all very enthusiastic about the technology, language independent semantic matching with a good set of rules for a range of languages available. Those of us who work on the Linux kernel were familiar with semantic matching and patching as implemented by Coccinelle which has been used quite successfully for years to both avoiding bad patterns in code and making tree wide changes, as demonstrated by the article it is a powerful technique. We were impressed by the multi-language support and approachability of semgrep, with tools like their web editor seeming particularly helpful for people getting started with the tool, especially in conjunction with the wide range of examples available.

This was a good discussion (including the tangential discussions of quality problems we had all faced dealing with software over the years, depressing though those can be) and semgrep was a great tool to learn about, I know I’m going to be using it for some of my projects.

by broonie at 11 October 2020, 18:50

09 September 2020

Naresh Bhat

Chapter2 - Building and Running: The Hello World Module

 The Hello World Module

The hello world is a very simple kernel module.  We will try to explore the same in this blog. I am assuming on your distribution (Debian, Ubuntu, Fedora, CentOS..etc) you have installed all the dependency packages, Kernel Source, and header files of the running kernel.  The sample hello.c kernel module can   

#include <linux/module.h>

#include <linux/init.h>

MODULE_LICENSE("Dual BSD/GPL");

MODULE_AUTHOR("NARESH BHAT");

static int __init hello_init(void) {

    printk(KERN_ALERT "Hello World!\n");

    return 0;

}

static void __exit hello_exit(void) {

    printk(KERN_ALERT "Good Bye, Cruel World!\n");

}
module_init(hello_init);
module_exit(hello_exit);

  • When a module is loaded hello_init will be called and after unloading the kernel hello_exit function will be called by running kernel.  These are defined by module_init and module_exit kernel macros.
  • Hence the module compiled version should match the running kernel version
  • The kernel entry point is module_init and the kernel exit point is module_exit
  • The special macros MODULE_AUTHOR, MODULE_LICENSE tell author and license used for the kernel module
  • The printk function defined in Linux kernel and made available to modules
  • Kernel needs it's own printing function because it runs by itself, without the help of the C library functions
  • After insmod has loaded, the module linked to kernel and can access the kernel's public symbols 
  • You can use insmod/rmmod utilities to load/unload the kernel module
  • The messages from printk goes into one of the system log files such as /var/log/messages
Compiling and Loading

The Makefile can be written as below

obj-m := hello.o
all:
        make -C /lib/mdules/$(shell unman -r)/build M=$(PWD) modules
clean
        make -C /lib/modules/$(shell unman -r)/build clean

How the makefile works ?
The obj-m := hello.o kernel build system handles the res.  The assignment says there is one module tube built from the object file hello.o.  The resulting module is named as hello.ko  after being built from the object file.
The command starts by changing to the directory path followed by -C (that is kernel source directory).  It finds the kernel top level makefile. The M= option cause that makefile to move back into module source directory before trying to build the modules target.  This target intern refers to the list of modules found in the obj-m variable.
The kernel developers have developed a sort of makefile idiom, which makes life easier for those building modules outside of the kernel tree.

Loading and Unloading Modules

After module build, next step is to load the module
  • insmod utility does the module inserting.  The program loads module code and data into the kernel, which intern performs the function similar to the ld, in that it resolves any unresolved symbol in the module to the symbol table of the kernel.
  • Load time configuration of module gives the user more flexibility than compile time configuration, which is still used sometime.
What actually happens when insmod used with a hello.ko ?
  • The insmod relies on the system call defined in kernel/module.c.  
    • The function sys_init_module  allocates memory to hold a module (memory allocated with vmalloc)
    • Then copies modules text region into memory region
    • Resolves references in kernel module via the kernel symbol table
    • Then calls the module_init function 
  • The system calls are prefixed with sys_
What is modprobe utility ? Where it is used ?
  • modprobe references any symbols that are not currently defined in the kernel. It basically check for the kernel module dependencies.
  • If any such references are found, modprobe looks for other modules in the current search path that defines relevant symbols
  • rmmod removes the module by calling module_exit function
  • lsmod list the modules currently loaded in the kernel
Version Dependency happens while building the module ?
  • One of the steps in the module build process is to link your module against a file (called vermagic.o) from current kernel tree
    • When module loaded kernel checks the processor specific configuration option for modules and makes sure that they match the running kernel.
  • This object contains a fair amount of information about the kernel the module was built for, including the target kernel version, compiler version and settings of the important configuration variables
  • If you want to build a module against specific kernel version then use KERNELDIR variable to kernel source directory
  • The definitions found at linux/version.h
  • The header file linux/module.h automatically includes version
  • module.h contains definitions about symbols and functions needed by loadable modules
  • init.h needed for initialisation and cleanup functions
  • You need to include moduleparam.h to enable the passing of the parameters to the module at load time.
The Kernel Symbol Table ?
  • A convenient way to manage visibility of your symbols
  • Reduce namespace pollution
  • promoting properinformation hiding
  • If your module needs to export symbols for other modules to use,  the following macros used
EXPORT_SYMBOL(name);
EXPORT_SYMBOL_GPL(name);

_GPL version makes the symbol available to GPL licensed modules only.
  • This variable stored in a special part of the module executable (an "ELF" section) that is used by the kernel at load time to find the variables exported by the module
Initialisation and Shutdown

static int __init initialisation_function(void)
{
        /* initialisation code here */
}
module_init(initialisation_function);
  • module_init should be defined as static because to make it local to that file
  • The __init means after the function loaded, initialisation done, then the memory is cleaned up.
  • There is a similar tag __initdata for data used only during initialisation.
  • __devinit and __devinitdata in the kernel source; these translate to __init and __initdata only if the kernel has not been configured for hot-luggable devices
  • module_init is a Kernel Entry point you must use it, if not then your module initialisation function never called
  • Most of the registration functions are prefixed with register_
static void __exit cleanup_function(void)
{
        /*cleanup function */
}

  • The __exit modifier marks the code being for module unload only (by causing compiler to place in special ELF section)
  • The __exit discarded if your module is built-in or configured to disallow the module unloading, Hence it can only be called while module unloading to system shutdown time. Any other use case is an error
  • module_exit is necessary to enable to kernel to find your cleanup function
Error handling
  • Error recovery sometimes best handled with goto statement.
  • The error codes are negative numbers belong to the set defined in <linux/error.h>
  • You can define your own error
Module Parameters

  • The parameters can be assigned at load time by insmod or modprobe
  • It can also read from /etc/modprobe.conf file
  • parameters are declared with the module_param macro defined in moduleparam.h
  • module_param takes 3 arguments name of the variable, type and permission mask
  • The macro should be placed outside of any function.  Typically found near head of the source file.
  • To declare array of parameters use module_param_array(name, type, num, perm);
    • name - name of the array
    • type - type of array elements
    • num - Integer variable
    • perm - permission value
  • Module loader refuses to accept more values than will fit in the array
Advantages and Disadvantages of Userspace drivers
  • full C library can be linked in
  • Can run conventional debugger on driver code
  • Userspace driver hangs, you can simply kill it
  • User memory swappable
  • If you want to write closed- source driver then user space option makes it easier

  • Interrupts are not available at user space, need to use signals
  • DMA is only possible only by mmapping /dev/mem, with privileged user
  • I/O port available only after calling ioperm or iopl
  • Response time is slower
  • If driver has been mapped to disk then, response time is unacceptably slow
  • Important devices can't be handled in user space driver 


Kernel Modules Versus Applications
  • Every kernel module registers itself in order to server future requests, called Event-Driven programming.  But all applications are not event driven
  • Kernel module use the exit function todo a clean exit by releasing all the resources and free the memory, if not they remain in system till rebooted.  But applications could be lazy kind of exit and will not affect the system much.
  • After initialization the function terminates immediately 
  • The application can call functions it does't define; the linking stage resolves external references using the appropriate library of functions. printf is one of those callable function defined in libc
  • A module on the other hand only linked to kernel, and only functions it can call are the ones exported by the kernel;  there are no libraries to link to.
  • Kernel module can be unloaded which is a modularisation approach,  will cutdown the development process time.
  • Application segfault are harmless, kernel fault kills the current process at least,  if not whole system.
User Space and Kernel Space
  • A module runs in kernel space, application run in user space
  • Kernel space OS must ensure unauthorised access to resources
  • Each one has it's own memory mapping and address space as well.
  • With system call or suspended hardware interrupt UNIX transfer the execution from user space to kernel space
Concurrency in Kernel
  • Application programming is the issue of concurrency.  Most applications, with notable exception of multithreading applications
  • Linux can run SMP systems, with the result that your driver could be executing concurrently on more than one CPU
  • The Linux kernel code, including driver, must be reentrant - it must be capable of running in more than one context at the same time.
  • Datastructure must be written carefully by taking care of memory corruption, concurrency and race conditions
The current process
  • Kernel modules don't execute sequentially as applications do
  • Kernel code can refer to the current process by accessing the global item current
  • The current pointer refer to the process that is currently executing
Few other details
  • Applications are laid out with very large stack area. Used to hold automatic variables, function call history 
  • Kernel has very small stack it's never good idea to declare large automatic variables; if you need larger structure you should allocate them dynamically at call time.
  • Kernel code can't do floating point arithmetic
  • Enabling floating point would require that the kernel save and restore the floating point processors state on each entry to and exit from kernel space.  Its a extra overhead not worthwhile



























by Naresh Bhat at 09 September 2020, 18:00

09 September 2020

Naresh Bhat

Advanced C TIPS and TECHNIQUES

 C TIPS and TECHNIQUES


Data Presentation


  • How will you represent -ve and +ve numbers in binary format ? Can you represent 0-127 +ve numbers and -1 to -128 -ve numbers ?

01111111 this is 127 the MSB is 0 in +ve series

……

……..

0000001  this is  1 

0000000  this is  0

11111111  this is -1  the MSB is 1 in -ve series

….

…...

1000000  this is -128

Using 2’s complement notation we can positive or negative values to each bit pattern.

  • What are C’s basic data types can you explain ?

C's basic data types are char, int, float, and double. Each size is machine dependent, but typical machines use 8 bits for a char, 32 bits for a float, and 64 bits for a double. The size of an int reflects the word size of the machine and is typically 16 or 32 bits.


  • What are qualifiers for basic data types ?

C provides long, short, and unsigned qualifiers for integer data types. A long is usually twice the size of an int on 16-bit machines, but long's and int's are often the same length for 32 bits and more. Going the other way, a short is half the size of an int on 32-bit machines, and short's are typically the same length on 16-bit machines. unsigned treats the most significant bit as a data bit and applies to char, int, short, and long data types. C integer data types are signed by default.


  • What are the basic data types in C and What are derived data types in C ?

    • char, int, float, double in addition there are number of qualifiers such as short (16 bits long) and long (32 bits). Int is either 16 or 32 bits depends on compiler, which is free to choose it.

    • Signed and unsigned are used prefix to char or any int.  Unsigned number are always positive or 0. Will vary from 0 to 255

    • Signed char will have values between -128 to 127 (in a 2’s complement machine), Is machine dependent .  But printable character’s are always positive.

    • Long double specifies extended precision floating point.

    • <limits.h> and <float.h> header files contains contains symbolic constants for all of these sizes , along with other properties of machine and compiler. 

    • Char constant is an integer written as one char in single quotes such as ‘x’ the value of a char constant is numeric value in the machine’s char set.  The char constant ‘\0’ represents NULL character and value is ZERO

    • A constant expression is an expression which evolves only constant.  Such expressions may be evaluated at during compile time rather than run-time

    • A string constant or string literal sequence of zero or more character’s surrounded by double quotes, as in “I am a string” or “ “

    •  There is another constant i.e. enumeration constant enum boolean {NO, YES}; it starts from 0,1 unless specific values assigned


  • Explain the data representation in C (signed data and unsigned data)

The MSB bit decides whether the data is signed or unsigned data,  The unsigned data are 2's compliment numbers.


Bit Pattern  Value

 

01111111    127

...

.....

00000001   1

00000000   0

11111111    -1

11111110   -2

...

.....

10000000 -128


Functions


  •  How will you distinguish between char constant and a string 

    • A character constant surrounded by single quotes ‘x’ and a string is “x”.  The ‘x’ is an integer and used to produce numeric value of the letter x in the machine character set.  The “x” is an array of character contains one character x and ‘\0’ 


  • What are the 2 basic function usage rules ?

    • Compiler does not convert function parameters float to int and vice versa. For example to use sqrt() in your C program you should use #include<math.h> header file. Otherwise the output will be meaning less.

    • If function returns other than int then you must specify before calling


Operators and Expressions

  • The % operators does not apply to two basic data type what are those ?

    • float, double


  • What are different operators in C

    Arithmetic, Relational, Logical, Bitwise, Assignment, Conditional, Increment and Decrement operators


Control Flow Constructs

  • What are different kinds of control flow constructs in C

    • if, if-else, while, do-while, for, switch, break, continue, goto

  • What is a NULL statement how it is terminated ?

In C the statements terminated with semicolon. C allows null statements (i.e. a semicolon with nothing else);


Array and Pointers
  • What is an array ? What are single dimension and multidimensional arrays ? Can you explain me how the following will be stored and what will be its size?

char gree[] = "hello"

sizeof(gree) ----> is 6 because 5 char + 1 NULL string appended by Compiler


  • What is a pointer ? What is indirection operator ? What is size of a pointer ? Can you explain with an example pointer to a pointer ?

int i = 3;

int *p = &i;

int **q = &p;


  • Can you explain command line arguments with an example ?


    Structures and Unions

  • What is the difference between a Structure and a Union ?

    Union allocated the biggest variable and Structure allocated memory each member.


  • What are the different types of operators C provides to access a structure ?

    dot and arrow


    Storage Classes

  • What are different types of storage classes ?

Automatic, Static (Internal static - is a local to file, External Static - Global), Register, Extern

Before program uses auto variable value it should have an initial value.

Static variable has a value ZERO before a program starts running, stored in initialized BSS (Data segment)


A variable is a Internal static when you declare it inside a block with a keyword static. Note that static storage class does not affect the variable scope.


Address operator (&) cannot be applied to a Register variable. Use of register variable is highly machine and compiler dependent. Register variable are handy for time critical code.

  • Initializing a static variable is a one step process compared to automation variable is a 2 step process.  Can you explain with an example ?

    • For example: 

{

int var1 = 3;        /* automatic variable */

static int var2 = 4; /* internal static variable */

var1 compiler allocates storage on the stack and compiler generates assembly code to load a constant 3 in that location.

var2 lives in initialized data .  The initialization is systems job not compiler.  The value 4 loaded into memory before your program runs.  Hence Initializing static variable is a one step process.

  • Difference between initializing char array and pointer

char buf[] = “hello.”

Char *p = “goodbye”


buf most likely declared outside the function

buf is the address of the char array 6+1 ie. ‘\0’ at the end

buf is a array name which is constant this is important because you can’t re-assign a new string

I.e. buf = “new string” /* illegal, does not compile */  

Because it is something like you are trying 5=6;

buf is a address of a first array element 


p is a pointer to a char string

if you change the value of p, you change what it points to

I.e p = “new string” 


  • What will happen when you make a function as static ? 

    If you make a function as static in kernel driver,  the function names will not be exported to kernel symbol table. 

  • When the compiler runs out of physical CPU registers, what will happen to the variables that you declare as register ?

It becomes Automatic


  • What are the advantage/disadvantage of Register variable

    Disadvantages:

    You can't apply & (address) operator on register variables

    Highly machine and compiler dependent 

    Very limited resources of the variables. When resources are not available the register declared variable will become automatic


    Advantages:

    Executes faster 

    Best suited for time critical functions


Preprocessor Directives and Macros

  • Explain the difference between Preprocessor directives and Macros ?

Adv - macros executes faster, but Disadv - Increases code size

  • What is casts why it is used ?

Cast is an operator, It converts the result is evaluated expression to the specified data type at run time. The values of the variables inside the expression are unaffected.

(type) expression;


  • What are the 4 program areas of run-time environment ? (Text, stack, data and heap area's)

  • What is stack frame ? 

    • When a system calls a function, the system pushes functions argument, local variables and return address on stack. We call this information as stack frame

  • Can you explain How the function pointers are stored ? How is stack frame construction when variable address is passed to function ?

  • Do you know by using which pointer information from stack frame the compiler generates the assembly code ?

    • Compiler generates assembly code using base pointer of the current stack frame to reference function arguments with a positive offset and automatic variables with a negative offset

  • What are different ways to call a function ? Can you explain difference between call by address and call by value ?

    • The call by value function uses the copies stored in the current stack frame and has  no way to refer or alter a and b.  This is the main difference between passing the address of a variable and passing its value.

  • How C reduces the runtime overhead in case of automatic variables ?

  • What is a volatile variable, explain its effect w.r.t compiler, w.r.t caches.

  • What are atomic operations ? Do you need hardware support for atomic operations ? 

  • What is recursive function ? What happens recursive function called unconditionally ?

    • A function called itself more than one time called recursive.  The You may run out of stack space


  • Can you explain non local control transfer ?

    • The C limits the destination of a goto to within the same function. The C library functions setjmp() and longjmp() provide capability to transfer control to another function.


  • Explain how does setjmp() and longjmp() works ?  or what is context switch explain it ?

    • setjmp() save the program environment ? is a machine dependent part because routine has to access all the machine registers and save them in jmp_buf array.  First time the setjmp() called the routine returns 0. As you will see it is not necessary for setjmp() to save stack frames or anything else.

    • longjmp() restore the program environment ? restores original machine registers saved by setjmp() and force machine to execute a different set of instructions.  Instead of executing next line in your program now returns from setjmp() with a non 0 value.   This changes a program to execute path, and we call it as a context switch.  Note that longjmp() function declared as void and you never return from a longjmp() call. 



C Debugging Techniques 

  • Overview of C debugging
    • C Preprocessors
    • ASCII and Hex debug display
    • Signals with debug output
    • Assertions
    • Selective Debug prints
    • Customised memory allocators

The Standard C compiler Under UNIX


Command cc file naming conventions


pgm.c - Source file

ppm.o - Object file (unlined)

ppm.s Assembly language source file

lib.a Archive (library) file 


Commandline options affecting the cc


Compiler Options

-O Optimize the step

$ cc -O prog.c sup1.c quick.s dbug.o 

compile and optimise prog.c and sup1.c, assembles quick.s and links to dbug.o This is completely depend on compiler. 


-c Suppress link edit step

$ cc -c objalloc.c debug.c 

the command creates two object files objalloc.o and debug.o with the "-c" option then neither of them are executable.

$ cc objalloc.c debug.c  

which creates a.out 


-S Assembly language listings

$ cc -S -O idim2.c 

This option creates assembly language program.  This option suppress the link edit step

 

Link Editor Options

-o Rename Object/Executable file

$ cc prog.c objalloc.c debug.o -o prog 

This create an executable file with the name prog

 

-l Name Libraries

$ cc servo.c-o servo -lm  

-lm makes the link editor to search the library


-i sharing text space

$ cc -i database.c -o database

database a pure object code file That is, the loader places the text and data segments in separate areas of memory.  e.g. /bin/sh and the editor /bin/vi are compiled with -i option. These programs have a read-only text space that user programs share from the same in-memory copy.  Each user with its own data area, but not it's own text area.


-s Strip an Executable Object file

Once the program is debugged you may want to reduce the size of its executable object file.  The -s option makes the link editor "strip" the relocation and symbolic information from an object file.


Preprocessor Options

-E Debugging the Preprocessor

$ cc -E setbit2.c

Makes preprocessor display on standard output what it's doing.  The most useful option to debug preprocessors.


-P Preprocessor File Output

$ cc -P mdim2.c

makes the preprocessor dump its output to a file


-D Preprocessor Defines

In the below example,

#ifdef DRIVER

main(){

....

}

#endif


The -D option compiles the driver


$ cc -DDRIVER dbug.c -o dbug


You can also use the -D to set constants


#define VERSION 5.0


In command line 

$ cc -c dbug.c -DVERSION 5.0 


-I Preprocessor include files

$ cc -c -O -I /usr/atl/include objalloc.c dbug.c

The -I option makes preprocessor to search the include files in a directory other than /usr/include 


#include "obj.h"

#include "dbug.h"


If the files are not available in the current directory then they sear under /usr/include

Another use is to substitute your own macros and header files for the ones in /usr/include


Compiler Debugging Options

-g Symbolic Debugging

$ cc -g -c objalloc.c dbug.c

$ cc -g -DDRIVER strmob.c objalloc.o dbug.o -o strmob


The UNIX and V provides symbolic debugger called sdb and the BSD provides symbolic debugger called dbx 


-p Performance Ananlysis

$ cc -p -c objalloc.c dbug.c

$ cc -p -DDRIVER strmob.c objalloc.o dbug.o -o strmob

$ prof -g strmob


UNIX System V also provides a tool called prof to study where a C program is spending its time and how many times your program calls each routine.  This routine will tells you which routine you need to improve the performance of a program.

  • First you need to compile program with -p option
  • Second you need to run your program which writes the profiling information to a file mon.out
  • Then you need to run prof tool to gather profiling information and display it for you.


The Run Time Environment

C is special in that you tell compiler where to place a variable in runtime environment.  The compiler does't decide, you do. It gives more control of executing your program.  The programmer must have a clear idea about what he/she is going to do. This is why it makes sense to find out how the C programs executes.  Understanding the runtime environment helps you to diagnose runtime bugs and write faster code.


Program Areas

  • The four program areas of the runtime environment; the
    • text area
    • stack
    • the data area
    • heap area
  • The text and data area are fixed.  Stack and Heap area's are dynamic means a program uses this area as it grows. The stack and heap are opposite end of memory.
Program areas at run time



Text Area
  • Reserved for executable instructions of a program
  • Is a readonly memory area by the system and often called program text
  • C compilers typically have compiler options to place text and data in separate spaces; this is sometime called pure It's possibly impure programs to modify themselves
  • The size of text area is fixed, determined at the time the program is compiled and linked.
  • The only time a program should access data in this area is with pointers to functions.
Data Area
  • It is also called static data
  • Static variables internal and external as well as global variables from your program
  • Decided into two parts; initialised and uninitialized
  • The system store variable depends on how you declare it,
    • If static or global variables initialized - goes into initiated area
    • Otherwise goes into uninitialised area
  • Uninitiazed data area sometime called as BSS (Block Started by Symbol)
  • The system fills the BSS area with ZEROs
  • We refer to variables in data area as load-time variables because the size is determined during loadtime
  • The loadtime variables have predetermined values when your program begins running.  Because the system allocates memory for loadtime variables and initialised with predetermined values and fills BSS area with zeros
The Stack
  • The stack is basis for what's called stack frame which is a mechanism that many compilers use to implement the function calls.
  • When you pass a parameter to function, the stack frame makes the data accessible to the function.
  • Compiler stores the Automatic variables in stack area
  • Since it happens in runtime,  the system can't determine the size of the stack before it runs.
  • Size of the stack changes as the program runs
  • The stack constantly growing and shrinking during the life of a program
The Heap
  • A heap manager, separate from your program, provides C library functions that allocate and deallocate storage. 
  • Pointers allow you to access heap memory as if it were an array, a structure or any C datatype
  • Heap is under programmers control and dynamic 
  • It's possible to use all heap space when a program runs
  • Programmers responsibility to address heap data properly and detect heap overflow errors

The TEXT area

Pointers to Functions

  • C program does not usually access the text area unless you use pointers to functions
  • Let's review pointer to function
int (*p) ();
  • The paranthesis surrounding the pointer *p is an indirection operatior.
  • Once you have declared you must initialise it
int func();
  • informs compiler that func is a function that returns integer
  • It's not necessary to pass function parameters here, compiler just want to know function name and return type
p = func;

  • Note that p must be a same datatype that func() returns; otherwise cast is necessary
  • parentheses does not necessary after function name; if you use them compiler tried to call that function
  • The second way to declare and initialising in the same statement
int func(), (*p)()=func;
  • Regardless of how you do, the pointer contains an address from the text area 

The Stack

  • What is a Stack ?
    • Simple data structure that stores the value, is a best example for last-in/first-out and also linked list (because the base pointer of the current stack frame contains the address of next stack frame pointer)
    • Stack used to store address or data
    • The system typically pushes and pops items on and off the stack in a last-in/first-out mannet
  • What does C store on stack ?
    • Automatic variables
    • Function parameters
    • Return address of functions
    • Temporary variables during expression evaluations
  • What is stack frame ?
    • When your program calls a function, system pushes important information like functions argument, local variables, and return address on stack.  We call it as stack frame
    • A separate stack frame each time system calls a function.
    • Not all C compilers use stack frames, some of them use registers which is faster than accessing the stack memory.
    • The compiler generates assembly code using base pointer of the current stack frame, 
      • reference function arguments with positive offset
      • automatic variables with negative arguments
  • What is the difference between calling a function with address as parameter and pass by value as parameter ?
    • The function uses the copy of the values stored in a frame in pass by value.
    • The function uses the address of a variables in pass by address, where the changes are reflected to variables
  • How does the printf() value of val and address of buf ?
    • Since we assume function arguments are pushed in reverse order on the stack, a pointer to the format string "%d %s" appears on top.
    • printf() can use the above information to determine how many arguments were passed and whether to interpret them by address or value
  • What about stack requirements ?
    • The stack frame requires is always the size of a pointer to the specified data object.
    • Passing address of an object, therefore, saves stack space and minimises the execution to setup a stack frame
  • What happens when we pass a structure ?
    • system duplicate the entire structure, resulting in a larger stack frame
    • compiler generates assembly code to copy all of the data to the new stack frame as well.

Initialising Automatic Variables

  • Why automatic variables are not initialised ?
    • When you initialise automatic variable, compiler generates additional assembly code to load the variable with a value.
    • Another reason is C allows goto's to jump into the middle of the block inside the same function. If a variable declared inside that block what is it's value ? compiler initialised automatic variable , it would have taken into special case into account.
For all these reasons C does not initialise automatic variables by default.  Their values are simply what's on the stack at runtime.  This illustrate the C's basic philosophies  -  reducing runtime overhead.
Recursive functions and Stack Overflow: When a recursive functions called too often you may run out of stack space (a condition called stack overflow) 



Nonlocal Control Transfers


What happens when you need to transfer program control to different functions ?

  • C library provides setjmp() and longjmp() provide this capability
  • The header file setjmp.h defines jmp_buf
  • Programs must use setjmp() and longjmp() in pairs
  • setjmp() routière called to save the program environment
  • The setjmp() returns 0 when you call it first time, otherwise it returns value of longjmp() second argument called somewhere else in your program
  • That's why you need to check the setjmp() return value after you call it
How does setjmp() save the program environment ?
  • Machine dependent because it has to call machine registers and save them in jmp_buf array
  • It's not necessary to save stack frame or anything else
How does longjmp() restore program environment ?
  • Calling logngjmp() restore all the original machine registers saved by setjmp()  to force the machine to execute a different set of instructions 
  • The program now returns from setjmp() with a nonzero value, This changes the execution path , and we call it as context switch
  • Note that longjmp() is declared void and never return from longjmp() call

The Data Area

  • Holds the program's data
  • Whatever the variable or data that you place here exist for the life of your program
  • The data area has two sections
    • One of them set aside for variables that you initialise
    • Other on't you don't
  • Allocation
    • Internal and External static variables
    • Global variables
    • Initialised arrays and structures
    • Constant strings
  • Internal static means a static variable declared inside a function or block
  • A variable private to it's file where it is declared called External Static
  • Global variables can be accessed out side of the file 
  • Constant strings are compiler dependent, most of the C compilers place them in data area.
  • Some C compilers give you an option for placing constant strings in text or data area.
BSS (Block Started by Symbol)


The data area decided into two separate sections, Static variables may appear in either one.  

  • Static variable without initialisation, it goes into a special part of the data area called the BSS
  • Otherwise it is placed into other section, initiazed data
  • Separating initiazed, uninitialised done by loader
  • C guantees un-initiazed static and global variables have zero values before a program runs
  • variables scope nothing todo with BSS
  • Compiler marks these variables so that loader group them together at link time
  • When it's time to run the program, the system load BSS variables into section of memory and zero-fill it all at once.  It is faster than initialising a variable at a time.
  • constant strings never be placed in BSS because they must be initialised
Initiazation
  • contains static and global variables that you explicitly initiaze in a C program
  • system initialise each variable in memory before program runs
  • constant strings, if they are stored in the data area, appear in initialised data.
Why it is important ? Why no-runtime overhead for initialising static and global variables ?

Consider an example,

{
int var1 = 3;                  /* automatic */
static int var2 = 4;        /* internal static */
..
....
}  


  • var1 is automatic compiler allocates storage on the stack. Following this, compiler generates assembly code to load a constant 3 in that location.  This is a two step process.
  • var2 lives in initialised data, 
    • Initialisation is systems job not compiler
    • the value 4 loaded into memory for var2 before program runs
    • there-fore initialising static variable is a one-step process
  • no-runtime overhead for initialising static and global variables.  Because compiler does not produce assembly code for initiazation
What about static declaration inside loop ?

  • retains it's previous value each time through the loop

/* loop2.c - creates static variable */

#include <stdio.h>

main ()

{

   int i;

   for (i 0; i < 3; i++) (

           static int j 10;

           printf("i %d\tj %d\n", i , j);

           j++;

}

$ loop2

i = 0 j =10

i = 1 j = 11

i = 2 j = 12


Constant Strings

  • C provides character constant and constant strings
  • When you enclose ASCII character in single quotes conpiler uses constant and not a memory location. e.g. compiler generate 6b for 'k' 
  • Constant strings allocated one byte for each character and one additional byte at the end i.e. NULL character
  • C allows constant strings for the following items
    • Initialising character pointer - char *boy = "kellen";
    • Assignment statement of character pointer - boy = "kellen";
    • Functions with character pointer - puts("kellen");
    • Character pointer references
Initize the character array

char girl [] = "sara";
or
char girl[] = {'s', 'a', 'r', 'a', '\0'};

will not be treated as a constant string.

/* strcon.c - constant strings as arrays and pointers */

main()

{

      printf("%s\n", "hello again" 6);

      printf("%c\n", *("12345" 3));

      printf("%c\n", "12345"[1]);

}

strcon

again

4

2


There are few more interesting problems


/* dup2.c - duplicate strings */

main()

{

   char *p, *q;

   p = "string";

   q = "string";

   p[3] '0';

   printf("%s %s\n", p, q);

}

$ dup2

strong string


The Heap

  • Allows programmers to allocate memory dynamically
  • It's possibly however heap and stack share the same memory segment
  • Stack growing down and Heap growing up in memory
  • It is required to monitor both these Heap and Stack area's  
  • Heap is controlled by Heap manager, which allocate and deallocate memory
  • User programs interface heap via C library calls
  • Following functions are available

TABLE 2-1. C library calls for the heap


Routine                                                             Meaning


char *malloc(size)                                             Allocate storage

unsigned size;                                                   for size bytes


char *calloc(n, size)                                          allocate and zero storage                

unsigned n, size;                                               for n items of size bytes


char *realloc(pheap, newsize)                          reallocate storage

char *pheap;                                                     for old heap pointer pheap

unsigned newsize;                                            for newsize bytes


void free(pheap)                                                free storage

char *pheap;                                                     for heap pointer pheap


  • malloc() returns heap address, on failure returns NULL
  • calloc() fills up heap memory with zeros there fore runs slower than malloc()
  • calloc() returns heap address, on failure NULL
  • realloc() allows you to change the size of any object on the heap, can increase/decrease objects size in heap memory 
  • Occasionally, realloc() returns same heap address
  • free() doesn't return anything 
Initialising Program Variables


  • Illegal initialisation


main ()

{

    char buf[S];                    /* stack */

    static char *p = buf;       /* initialized data */

}


  • This one correctly initializes a pointer variable at load time.


main ()

{

    static char buf[5];       /* BSS - buf initialized with zero */

    static char *p buf;   /* initialized data area */

}

  • Automatic variables on stack area

main()

{

    char buf[5];      /* stack, buf is stored with junk values */

    char *p buf;  /* stack */

}

  • Static buf and automatic pointer 

main()

{

    static char buf[S];   /* BSS - buf initialised with zero */

    char *p buf;       /* stack */

}


Use with Functions


  • you want to declare a variable and initialize it with a function call. With automatic variables, this is legal.

int length = strlen(argv[1]);

  • You can't initialise a static variable to the value of another static variable, We can't use functions return value either.  Because both actions are RUN time events 
static int length = strlen(argv[1]);  /* Does't compile */
  • You can split above statement into two, which works

static int length;                 /* BSS */

length = strlen(argv[1]);



What about pointers to functions ?

  • You can initialise both automatic and static pointers to functions, because function address are known at load time
main () {

int f();

int (*p) () = f;                /* auto, stack area */

static int (*q) () = f;      /* initiazed data area */
...
......
}
f() {                          /* text */
...
.....
}

  • Do not return the address of an automatic variable from a function (refer page 104 for example)

Summary

  • C programs use the text, stack, data, and heap program areas of the run time environment. It's possible that a program will not use the data or heap areas.
  • The text area is normally write-protected from an executing program. Pointers to functions are addresses from the text area.
  • The compiler uses the stack for function return addresses, stack frames, and intermediate storage.
  • The compiler also uses the stack for automatic variables and recursive functions.
  • The data area contains program variables that store values for the life of your program.
  • The BSS part of the data area contains the uninitialized static and global variables from a C program.
  • All static and global variables in the BSS have zero values before your program runs.
  • Constant strings typically live in the data area, but some compilers may place them in the text area.
  • The compiler usually makes separate copies of a single constant string in the data area if you define it more than once. The implementation of constant strings is compiler and machine dependent.
  • The heap is memory you control from C library routines. Heap memory is preserved through function calls.
  • The heap and stack sometimes share the same memory area; hence, a large amount of heap space may decrease the amount of available stack space, and vice versa.
  • The text area and data areas are fixed in size when your program runs. The stack and heap are dynamic.


An Array of Choices

  • The compiler treats array names as constants
  • The array names is a pointer
  • a[i] = (*(a+i)) = (*(i+a)) = i[a]
  • C allows two arithmetic operators (+ and -) with operator and an integer expression
  • Compact pointer expressions: We call pointer expressions that uses indirection (*) and auto increments (++) or auto decrements (--) compact pointer expressions.  In the most cases compiler produces assembly code that is smaller size and runs faster.
Expression    Operation           Effects
--------------------------------------------------
*p++              post increment   pointer
*p--                post decrement  pointer
*++p              pre increment     pointer
*--p                pre decrement    pointer

++*p              pre increment      object
--*p                pre decrement     object
(*p)++           post decrement    object
(*p)--             post decrement    object
----------------------------------------------------
  • You can't use pointer to a structure, union or function with these expressions  
  • Negative subscripts:
&a[n] = a+n = (char *)a + n + sizeof(object)
Now, substituting 0  for n we have
&a[0] = a = (char *)a

  • This explains why array subscripts starts at ZERO. 
  • The compiler always use the base address for the start of the array
  • What happens when n is negative ?
&a[-n] = a-n = (char *)a -n * sizeof(object)

The pointer offset if below the base address of the array.  This shows negative subscripts are legal in C.


Programming questions

  • Program to find a number palindrome or not
  • A program logic to take 100 digit number as input and increment it by 1
  • Write a program to convert integer number to hex
  • Write a program to set, unset the nth bit 
  • Write a program to count number of char in a string
  • W.A.P to find size of integer without using sizeof
  • write a macro to find MIN of two variables, three variables.
  • W A P to find all combinations of lucky number for your vehicle registration.
  • Lucky number is formed using one’s b’day. for eg: if the b’day is 15th apr 1944 

1 + 5 + 0 + 3 + 1 + 9 + 4 + 4  = 27

2 + 7 = 9

Lucky number is 9.

Now out of pattern 

KA-01, MF - XXXX 


where XXXX is the combination of numbers which make up to 9 for eg:

8001 => 8 + 0 + 0 + 1 = 9 

0711 => 0 + 7 + 1 + 1 = 9



---------------------------------------------- END -----------------------------------------------------------------------








by Naresh Bhat at 09 September 2020, 06:46

25 August 2020

Gema Gomez

My new overlocker

One thing that became clear during lockdown is that hobbies are to be cherished. Sewing was one of mine that had been dormant for quite some time. When I started trying to sew a dress a couple of months back I realised how tedious it was to try to sew neatly on the inside of garments with just my sewing machine. That is when I started to look at overlockers.

Originally I thought that a coverstitch + overlocker combo machine was the way to go, but after some research it became clear that both those functions really belonged on different machines, and since space was not a problem for us at this point, I decided to buy an overlocker only machine and think about a coverstitch machine at a later time. This is what my Bernina L460 looks like:

Overlocker

For those of you who haven’t thought about which machine is the one that makes the stitching you see inside most of your store bought garments, this is what the overlocker stitching looks like: Sample sewing

It has two needles (that can be used together or not, depending on what stitch you are after) and two loopers, an upper looper and a lower looper. There is no bobbin. There is also a blade that cuts the fabric and a little tray that collects the scraps of fabric as you sew. This is quite different from a sewing machine but also really powerful. It leaves very neat edges (no fraying). It uses up to four different threads at any given time and the cutting length as well as a few other things are adjustable. There is pretty much no turning on corners, so you sew in straight or curvy lines and then go again from the other side if you need to turn a 90 degree-ish corner.

This is what the machine looks like on the inside:

Open overlocker

I have been sewing some masks and some home cushions with it to get acquainted and I couldn’t be happier with the result, it certainly is a very good tool for my sewing toolbox!

by Gema Gomez at 25 August 2020, 23:00

22 August 2020

Naresh Bhat

Chapter12- PCI Drivers


PCI Drivers

  • The PCI bus achieves better performance by using a higher clock rate than ISA
  • Each PCI peripheral is identified by a bus number, a device number, and a function number
  • The PCI spec permits single system to host upto 256 buses, Since it is not sufficient the Linux now supports PCI domains
    • Each PCI domain can host upto 256 buses
    • Each bus host upto 32 devices
    • Each device can be multifunction board. e.g. Audio device with CDROM drive with a maximum of 8 functions
    • Each function can be identified at hardware level by a 16 bit address or key
    • Device drivers in Linux don't need to deal with those binary address because they use specific data structure called pci_dev to act on the devices
  • Bridges are special purpose peripherals whose task is joining two peripheral
  • Overall layout of PCI system is a tree where each bus is connected to an upper layer bus, upto bus 0 at the root of the tree
  • lspci is a part of PCI utils package and the layout information in /proc/pci and /proc/bus/pci
  • The sysfs representation of PCI devices also shows addressing scheme
  • Hardware circuitry of each pheripheral board answers queries pertaining to 3 address space
    • Memory locations
    • I/O ports
    • Configuration Registers
  • The memory and IO address space shared by all the devices on the same PCI bus access memory locations
  • All devices on the PCI bus see the bus cycle at the same time. 
  • The configuration space on the other hand , exploits geographical addressing 
    • configuration queries address only one slot at a time, so they never collide.
How PCI device read/write, interrupt works ?
  • The memory and IO regions are accessed in the usual ways inb() and ready()
  • The configuration transactions are performed by calling kernel specific functions to access configuration registers.
  • Every PCI slot has 4 interrupt pins and each device function can use one of them without being concerned about how those pins are routed to the CPU
  • I/O space in PCI bus uses a 32bit address (leading to 4GB of I/O ports)
  • The memory space can be access by either 32 or 64bit address space
  • The software can configure the two devices to the same address space
  • Every memory and I/O region offered by interface board.  But remade by means of configuration transactions
Who initialise PCI device during boot up and how it is ready tobe accessed later ?
  • The Firmware initialises the PCI hardware at system boot mapping each region to a different address to void collisions 
  • The address to which these regions are currently mapped can be read from configuration space, So the Linux driver can access its devices without probing
  • After reading configuration registers the driver can safely access hardware

  • PCI configuration space consists of 256 bytes for each device function
  • PCIe have 4KB configuration space for each function
  • Four bytes of configuration space hold a unique function ID, So the driver can identify its device by looking for the specific ID for that peripheral
    • Each device board is geographically addressed to retrieve its configuration registers 
    • The information in those registers can then be used to perform normal I/O access
    • Without need of further geographic addressing
  • The main innovation of PCI interface over ISA is the configuration address space
  • Addition to the usual driver code , a PCI driver needs the ability to access the configuration space, in order to save itself from risky probing task.
Boot Time
  • When a power applied to a PCI device, the hardware remains inactive.
  • The device responds only to configuration transactions
  • At power ON there is no memory, no I/O ports mapped 
  • Every other device specific feature such as interrupt reporting, is disabled as well.
  • Every PCI motherboard equipped with PCI aware firmware, called BIOS, NVRAM, or PROM, depending on the platform
  • The firmware offers access to the device configuration address space by reading and writing registers in the PCI controller.
At system boot,
  • Firmware or Linux Kernel , if so configured performs the configuration transactions with every PCI peripheral in order to allocate a safe place for each address region it offers.
  • By the time device driver access the device it's memory and I/O regions have already been mapped into the processor's address space.
  • The driver can change the default assignment, but it never needs to do that
  • tree /sys/bus/pci/devices/0000:00:10.0
    • config - binary file allows the raw PCI config information tobe read from the device
    • The files vendor, device, subsystem_device, subsystem_vendor, and class all refer to the specific values of this PCI device.
    • The file ire shows the current IRQ assigned to this PCI device
    • The file resource shows the current memory resources allocated by this device
Configuration Registers and Initialisations

  • All PCI devices 256 byte address space.  The first 64 bytes are standardised, while the rest are device dependent
  • The PCI registers are always little-endian
  • The driver writer should be careful about byte ordering which is taken care by Linux developers.  Because PCI designed toward the PC environment
  • Want to convert the data from host order to PCI or vice versa , you can resort to the functions defined in <asm/byteorder.h>
  • The technical documentation released with each device describes the supported registers
  • Three or five registers identify a device
    • vendorID
      • 16-bit register identify hardware manufacturer, There is a global registry of such numbers, maintained by the PCI Special Interest Group
      • Manufacturer must apply to have a uniq number assigned to them
    • deviceID
      • 16-bit register, selected by manufacturer; no special registration required 
      • This ID paired with vendorID to make a uniq 32-bit identifier for hardware device
      • We use the word "Signature" to refer to the above pair
    • class
      • 16-bit value who's top 8bits identify the base class
      • Every peripheral device belongs to a class
      • Example "ethernet" and "token ring" are two classes belonging to the network group.
      • "serial" and "parallel" classes belongs to "communication" group
      • Some drivers can support several similar devices with different signature but belongs to same class
    • subsystem vendorID and subsystem deviceID
      • Used to further identification of the device
      • If the chip is generic interface chip to a local bus (on board), it is often used
  • Every PCI manufacturer assigns proper values to these readonly registers.  The driver can use them to identify the device
  • The fields subsystem vendorID and subsystem deviceID are sometimes set by vendor to further differentiate similar devices
How the PCI hot plug works ?

MODULE_DEVICE_TABLE
  • The pci_device_id structure needs to be exported to user space to allow hot plug and module loading systems know what module works with what devices.
MODULE_DEVICE_TABLE(pci, i810_ids);
  • Creates local variable called __mod_pci_table that points to list of struct pci_device_id
  • Later in kernel build process , 
    • the depmod program searches all modules for symbol __mod_pci_table
    • If symbol found , it pulls data out of module and adds it to file /lib/modules/KERNEL_VERSION/modules.pcimap
  • After depmod completes , all the PCI devices that are supported by modules in the kernel are listed, along with their module names in that file.
  • When kernel tells hotplug system that a new PCI device has been found, uses modules.pcimap file to find proper driver 

Registering the PCI driver

  • The struct pci_driver structure, contains number of functions and call back variables
    • name
    • pointer to pci device id table
    • probe
    • remove
    • suspend
    • resume
  • The pci_register_driver function used to register the driver and pci_unregister_driver used to unregister the driver
Enabling the PCI device

  • In the probe function for the PCI driver, before driver can access any device resources memory, I/O
  • The driver must call pci_enable_device()
    • Enables the device
    • In some cases wakes up the device
    • In some cases assigns its interrupt line and I/O regions
Accessing the Configuration Space 
  • After driver detected the device , it usually needs to read/write to the three memory space i.e. memory, port and configuration
  • Accessing configuration space is vital to the driver, because it is the only way it can find out where the device is mapped in memory and in the I/O space.
  • The microprocessor has no way to access the configuration space directly
  • The computer vendor has to provide the way todo it
  • To access configuration space the CPU must write and read registers in the PCI controllers
  • The configuration space can be access through u8, u16, u32 (pci_read_config_byte/word/dword).  Where the byte/word/dword offset is beginning of the configuration space. The word and dword functions convert the value just read from little-endian to the native byte order of the processor, so no need to deal with byte ordering
  • The pci_write_confi_byte/word/dword will write into configuration space. The word, dword functions convert the value to the little endian before writing to the peripheral device
  • All the above functions are implemented as inline functions that really calls
    • pci_bus_read_config_byte/word/dword
    • pci_bus_write_config_byte/word/dword
  • The best way to address the configuration variable using pci_read_ functions by means of the symbolic names defined in <linux/pci.h>
The below function retrieves the revision ID 

Accessing the I/O and Memory Spaces
  • The PCI devices that implements I/O registers as a memory region marks the difference by setting a "memory-is-prefetchable" bit in the configuration register
  • If memory regions marked as prefetchable, the CPU can cash its contents and do all sorts of optimisation with it. e.g. video memory on PCI board is prefetchable 
  • nonprefetchable memory access, peripherals that map their control registers to a memory range declare that range as nonprefetchable
  •  The I/O regions of the PCI devices have been integrated into the generic resource management. For this reason you don't need to access configuration variables in order to know where your device is mapped in memory or I/O space
    • unsigned long pci_resource_start(struct pci_dev *dev, int bar);
      • returns first address associated with one of the six PCI I/O regions.  The region selected by integer bar (base address register), ranging from 0-5
    • unsigned long pci_resource_end(struct pci_dev *dev, int bar);
      • returns last address that is part of the I/O region number bar.
      • Note that it is a last usable address, not the first address after the region
PCI Interrupts
  • By the time linux boots the firmware has already assigned a uniq interrupt number to device
  • The interrupt number stored in configuration register 60 (PCI_INTERRUPT_LINE), one byte wide, allows 256 interrupt lines, but actual limit depends CPU
  • If the device does not support interrupts, register 61 (PCI_INTERRUPT_PIN) is 0; other wise, nonzero
  • PCI specific code dealing with interrupts just needs to read the configuration byte to obtain  


by Naresh Bhat at 22 August 2020, 10:12