commit 9ca1d50fa959cda1f04f43275f7930a70f1a631e Author: Greg Kroah-Hartman Date: Fri Jun 24 10:18:38 2016 -0700 Linux 4.4.14 commit e917563612e5d8ad3a80efa5f43e654be50fe82f Author: Florian Westphal Date: Fri Apr 1 15:37:59 2016 +0200 netfilter: x_tables: introduce and use xt_copy_counters_from_user commit d7591f0c41ce3e67600a982bab6989ef0f07b3ce upstream. The three variants use same copy&pasted code, condense this into a helper and use that. Make sure info.name is 0-terminated. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit d69f93d059c6294322bb91f9aaff796a21c3aa20 Author: Florian Westphal Date: Fri Apr 1 14:17:34 2016 +0200 netfilter: x_tables: do compat validation via translate_table commit 09d9686047dbbe1cf4faa558d3ecc4aae2046054 upstream. This looks like refactoring, but its also a bug fix. Problem is that the compat path (32bit iptables, 64bit kernel) lacks a few sanity tests that are done in the normal path. For example, we do not check for underflows and the base chain policies. While its possible to also add such checks to the compat path, its more copy&pastry, for instance we cannot reuse check_underflow() helper as e->target_offset differs in the compat case. Other problem is that it makes auditing for validation errors harder; two places need to be checked and kept in sync. At a high level 32 bit compat works like this: 1- initial pass over blob: validate match/entry offsets, bounds checking lookup all matches and targets do bookkeeping wrt. size delta of 32/64bit structures assign match/target.u.kernel pointer (points at kernel implementation, needed to access ->compatsize etc.) 2- allocate memory according to the total bookkeeping size to contain the translated ruleset 3- second pass over original blob: for each entry, copy the 32bit representation to the newly allocated memory. This also does any special match translations (e.g. adjust 32bit to 64bit longs, etc). 4- check if ruleset is free of loops (chase all jumps) 5-first pass over translated blob: call the checkentry function of all matches and targets. The alternative implemented by this patch is to drop steps 3&4 from the compat process, the translation is changed into an intermediate step rather than a full 1:1 translate_table replacement. In the 2nd pass (step #3), change the 64bit ruleset back to a kernel representation, i.e. put() the kernel pointer and restore ->u.user.name . This gets us a 64bit ruleset that is in the format generated by a 64bit iptables userspace -- we can then use translate_table() to get the 'native' sanity checks. This has two drawbacks: 1. we re-validate all the match and target entry structure sizes even though compat translation is supposed to never generate bogus offsets. 2. we put and then re-lookup each match and target. THe upside is that we get all sanity tests and ruleset validations provided by the normal path and can remove some duplicated compat code. iptables-restore time of autogenerated ruleset with 300k chains of form -A CHAIN0001 -m limit --limit 1/s -j CHAIN0002 -A CHAIN0002 -m limit --limit 1/s -j CHAIN0003 shows no noticeable differences in restore times: old: 0m30.796s new: 0m31.521s 64bit: 0m25.674s Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 3a69c0f0487a6eba5fd5005c7902a230c8e31518 Author: Florian Westphal Date: Fri Apr 1 14:17:33 2016 +0200 netfilter: x_tables: xt_compat_match_from_user doesn't need a retval commit 0188346f21e6546498c2a0f84888797ad4063fc5 upstream. Always returned 0. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 0fab6d3d18866bbc4f964696ae31a5456795bb10 Author: Florian Westphal Date: Fri Apr 1 14:17:31 2016 +0200 netfilter: ip6_tables: simplify translate_compat_table args commit 329a0807124f12fe1c8032f95d8a8eb47047fb0e upstream. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 77521be687870e2ecae748df7b8fb103dd57dac2 Author: Florian Westphal Date: Fri Apr 1 14:17:30 2016 +0200 netfilter: ip_tables: simplify translate_compat_table args commit 7d3f843eed29222254c9feab481f55175a1afcc9 upstream. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 946e8148dba1943e1762833694ec2c476469d1fe Author: Florian Westphal Date: Fri Apr 1 14:17:32 2016 +0200 netfilter: arp_tables: simplify translate_compat_table args commit 8dddd32756f6fe8e4e82a63361119b7e2384e02f upstream. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit fe1e4026ce9f03653288c743218ed70ee0a2c4e0 Author: Florian Westphal Date: Wed Jun 1 02:04:44 2016 +0200 netfilter: x_tables: don't reject valid target size on some architectures commit 7b7eba0f3515fca3296b8881d583f7c1042f5226 upstream. Quoting John Stultz: In updating a 32bit arm device from 4.6 to Linus' current HEAD, I noticed I was having some trouble with networking, and realized that /proc/net/ip_tables_names was suddenly empty. Digging through the registration process, it seems we're catching on the: if (strcmp(t->u.user.name, XT_STANDARD_TARGET) == 0 && target_offset + sizeof(struct xt_standard_target) != next_offset) return -EINVAL; Where next_offset seems to be 4 bytes larger then the offset + standard_target struct size. next_offset needs to be aligned via XT_ALIGN (so we can access all members of ip(6)t_entry struct). This problem didn't show up on i686 as it only needs 4-byte alignment for u64, but iptables userspace on other 32bit arches does insert extra padding. Reported-by: John Stultz Tested-by: John Stultz Fixes: 7ed2abddd20cf ("netfilter: x_tables: check standard target size too") Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit caa39a1e709c3cf89c0a4d600e98f9863085e3d5 Author: Florian Westphal Date: Fri Apr 1 14:17:29 2016 +0200 netfilter: x_tables: validate all offsets and sizes in a rule commit 13631bfc604161a9d69cd68991dff8603edd66f9 upstream. Validate that all matches (if any) add up to the beginning of the target and that each match covers at least the base structure size. The compat path should be able to safely re-use the function as the structures only differ in alignment; added a BUILD_BUG_ON just in case we have an arch that adds padding as well. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 8a865621540c7bc7f03665a2b7029cb444a9593a Author: Florian Westphal Date: Fri Apr 1 14:17:28 2016 +0200 netfilter: x_tables: check for bogus target offset commit ce683e5f9d045e5d67d1312a42b359cb2ab2a13c upstream. We're currently asserting that targetoff + targetsize <= nextoff. Extend it to also check that targetoff is >= sizeof(xt_entry). Since this is generic code, add an argument pointing to the start of the match/target, we can then derive the base structure size from the delta. We also need the e->elems pointer in a followup change to validate matches. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 2066499780e1455c43833b5b34858124047331ff Author: Florian Westphal Date: Fri Apr 1 14:17:27 2016 +0200 netfilter: x_tables: check standard target size too commit 7ed2abddd20cf8f6bd27f65bd218f26fa5bf7f44 upstream. We have targets and standard targets -- the latter carries a verdict. The ip/ip6tables validation functions will access t->verdict for the standard targets to fetch the jump offset or verdict for chainloop detection, but this happens before the targets get checked/validated. Thus we also need to check for verdict presence here, else t->verdict can point right after a blob. Spotted with UBSAN while testing malformed blobs. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 2985d199e713c05eec2eaffeeeac40682aa2e5cc Author: Florian Westphal Date: Fri Apr 1 14:17:26 2016 +0200 netfilter: x_tables: add compat version of xt_check_entry_offsets commit fc1221b3a163d1386d1052184202d5dc50d302d1 upstream. 32bit rulesets have different layout and alignment requirements, so once more integrity checks get added to xt_check_entry_offsets it will reject well-formed 32bit rulesets. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit ed30e07de077354993122c5d88e535cbe0a03754 Author: Florian Westphal Date: Fri Apr 1 14:17:25 2016 +0200 netfilter: x_tables: assert minimum target size commit a08e4e190b866579896c09af59b3bdca821da2cd upstream. The target size includes the size of the xt_entry_target struct. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 6bc803b795631cb14a6a3ea6d433589c8c666cc3 Author: Florian Westphal Date: Fri Apr 1 14:17:24 2016 +0200 netfilter: x_tables: kill check_entry helper commit aa412ba225dd3bc36d404c28cdc3d674850d80d0 upstream. Once we add more sanity testing to xt_check_entry_offsets it becomes relvant if we're expecting a 32bit 'config_compat' blob or a normal one. Since we already have a lot of similar-named functions (check_entry, compat_check_entry, find_and_check_entry, etc.) and the current incarnation is short just fold its contents into the callers. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit cfdca13028ff3aa8c5f4b63ba5abf878cd55ced5 Author: Florian Westphal Date: Fri Apr 1 14:17:23 2016 +0200 netfilter: x_tables: add and use xt_check_entry_offsets commit 7d35812c3214afa5b37a675113555259cfd67b98 upstream. Currently arp/ip and ip6tables each implement a short helper to check that the target offset is large enough to hold one xt_entry_target struct and that t->u.target_size fits within the current rule. Unfortunately these checks are not sufficient. To avoid adding new tests to all of ip/ip6/arptables move the current checks into a helper, then extend this helper in followup patches. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 611d408a531fdbecf07e268ac87e37d71dd5cd8e Author: Florian Westphal Date: Fri Apr 1 14:17:22 2016 +0200 netfilter: x_tables: validate targets of jumps commit 36472341017529e2b12573093cc0f68719300997 upstream. When we see a jump also check that the offset gets us to beginning of a rule (an ipt_entry). The extra overhead is negible, even with absurd cases. 300k custom rules, 300k jumps to 'next' user chain: [ plus one jump from INPUT to first userchain ]: Before: real 0m24.874s user 0m7.532s sys 0m16.076s After: real 0m27.464s user 0m7.436s sys 0m18.840s Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit d6f7cd1b21b9e797e09269ee16655f9c0e4a3fa1 Author: Florian Westphal Date: Fri Apr 1 14:17:21 2016 +0200 netfilter: x_tables: don't move to non-existent next rule commit f24e230d257af1ad7476c6e81a8dc3127a74204e upstream. Ben Hawkes says: In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it is possible for a user-supplied ipt_entry structure to have a large next_offset field. This field is not bounds checked prior to writing a counter value at the supplied offset. Base chains enforce absolute verdict. User defined chains are supposed to end with an unconditional return, xtables userspace adds them automatically. But if such return is missing we will move to non-existent next rule. Reported-by: Ben Hawkes Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 0d6ad54b74fd66ca016076900da97e96272ff83e Author: Maarten Lankhorst Date: Wed May 4 14:38:26 2016 +0200 drm/core: Do not preserve framebuffer on rmfb, v4. commit f2d580b9a8149735cbc4b59c4a8df60173658140 upstream. It turns out that preserving framebuffers after the rmfb call breaks vmwgfx userspace. This was originally introduced because it was thought nobody relied on the behavior, but unfortunately it seems there are exceptions. drm_framebuffer_remove may fail with -EINTR now, so a straight revert is impossible. There is no way to remove the framebuffer from the lists and active planes without introducing a race because of the different locking requirements. Instead call drm_framebuffer_remove from a workqueue, which is unaffected by signals. Changes since v1: - Add comment. Changes since v2: - Add fastpath for refcount = 1. (danvet) Changes since v3: - Rebased. - Restore lastclose framebuffer removal too. Fixes: 13803132818c ("drm/core: Preserve the framebuffer after removing it.") Testcase: kms_rmfb_basic References: https://lists.freedesktop.org/archives/dri-devel/2016-March/102876.html Cc: Thomas Hellstrom Cc: David Herrmann Reviewed-by: Daniel Vetter Tested-by: Thomas Hellstrom #v3 Tested-by: Tvrtko Ursulin Signed-off-by: Daniel Vetter Link: http://patchwork.freedesktop.org/patch/msgid/6c63ca37-0e7e-ac7f-a6d2-c7822e3d611f@linux.intel.com Signed-off-by: Hans de Goede Signed-off-by: Greg Kroah-Hartman commit dbea3ce55ad13ddccf519a95ce0d77a5064e9ccc Author: Tadeusz Struk Date: Fri Apr 29 10:43:40 2016 -0700 crypto: qat - fix adf_ctl_drv.c:undefined reference to adf_init_pf_wq commit 6dc5df71ee5c8b44607928bfe27be50314dcf848 upstream. Fix undefined reference issue reported by kbuild test robot. Reported-by: kbuild test robot Signed-off-by: Tadeusz Struk Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit 5ebdccd7685f1c0b451c516f99082642d8d49003 Author: Florian Westphal Date: Tue Mar 22 18:02:52 2016 +0100 netfilter: x_tables: fix unconditional helper commit 54d83fc74aa9ec72794373cb47432c5f7fb1a309 upstream. Ben Hawkes says: In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it is possible for a user-supplied ipt_entry structure to have a large next_offset field. This field is not bounds checked prior to writing a counter value at the supplied offset. Problem is that mark_source_chains should not have been called -- the rule doesn't have a next entry, so its supposed to return an absolute verdict of either ACCEPT or DROP. However, the function conditional() doesn't work as the name implies. It only checks that the rule is using wildcard address matching. However, an unconditional rule must also not be using any matches (no -m args). The underflow validator only checked the addresses, therefore passing the 'unconditional absolute verdict' test, while mark_source_chains also tested for presence of matches, and thus proceeeded to the next (not-existent) rule. Unify this so that all the callers have same idea of 'unconditional rule'. Reported-by: Ben Hawkes Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 868fe2536f8741ebf807ed717734e6c321c478e9 Author: Florian Westphal Date: Tue Mar 22 18:02:50 2016 +0100 netfilter: x_tables: make sure e->next_offset covers remaining blob size commit 6e94e0cfb0887e4013b3b930fa6ab1fe6bb6ba91 upstream. Otherwise this function may read data beyond the ruleset blob. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 59ff9f9b38b39afeb167fdc16c52976587b2c45a Author: Florian Westphal Date: Tue Mar 22 18:02:49 2016 +0100 netfilter: x_tables: validate e->target_offset early commit bdf533de6968e9686df777dc178486f600c6e617 upstream. We should check that e->target_offset is sane before mark_source_chains gets called since it will fetch the target entry for loop detection. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit ccb85835a159923d7f79fd51cfd614962426ecf2 Author: Ralf Baechle Date: Thu Feb 4 01:24:40 2016 +0100 MIPS: Fix 64k page support for 32 bit kernels. commit d7de413475f443957a0c1d256e405d19b3a2cb22 upstream. TASK_SIZE was defined as 0x7fff8000UL which for 64k pages is not a multiple of the page size. Somewhere further down the math fails such that executing an ELF binary fails. Signed-off-by: Ralf Baechle Tested-by: Joshua Henderson Cc: James Hogan Signed-off-by: Greg Kroah-Hartman commit 561e4453dd06f56cd8a61ced33964189b3651558 Author: David S. Miller Date: Sat May 28 20:41:12 2016 -0700 sparc64: Fix return from trap window fill crashes. [ Upstream commit 7cafc0b8bf130f038b0ec2dcdd6a9de6dc59b65a ] We must handle data access exception as well as memory address unaligned exceptions from return from trap window fill faults, not just normal TLB misses. Otherwise we can get an OOPS that looks like this: ld-linux.so.2(36808): Kernel bad sw trap 5 [#1] CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34 task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000 TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002 Not tainted TPC: g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001 g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001 o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358 o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c RPC: l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000 l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000 i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000 i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c I7: Call Trace: [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c The window trap handlers are slightly clever, the trap table entries for them are composed of two pieces of code. First comes the code that actually performs the window fill or spill trap handling, and then there are three instructions at the end which are for exception processing. The userland register window fill handler is: add %sp, STACK_BIAS + 0x00, %g1; \ ldxa [%g1 + %g0] ASI, %l0; \ mov 0x08, %g2; \ mov 0x10, %g3; \ ldxa [%g1 + %g2] ASI, %l1; \ mov 0x18, %g5; \ ldxa [%g1 + %g3] ASI, %l2; \ ldxa [%g1 + %g5] ASI, %l3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %l4; \ ldxa [%g1 + %g2] ASI, %l5; \ ldxa [%g1 + %g3] ASI, %l6; \ ldxa [%g1 + %g5] ASI, %l7; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i0; \ ldxa [%g1 + %g2] ASI, %i1; \ ldxa [%g1 + %g3] ASI, %i2; \ ldxa [%g1 + %g5] ASI, %i3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i4; \ ldxa [%g1 + %g2] ASI, %i5; \ ldxa [%g1 + %g3] ASI, %i6; \ ldxa [%g1 + %g5] ASI, %i7; \ restored; \ retry; nop; nop; nop; nop; \ b,a,pt %xcc, fill_fixup_dax; \ b,a,pt %xcc, fill_fixup_mna; \ b,a,pt %xcc, fill_fixup; And the way this works is that if any of those memory accesses generate an exception, the exception handler can revector to one of those final three branch instructions depending upon which kind of exception the memory access took. In this way, the fault handler doesn't have to know if it was a spill or a fill that it's handling the fault for. It just always branches to the last instruction in the parent trap's handler. For example, for a regular fault, the code goes: winfix_trampoline: rdpr %tpc, %g3 or %g3, 0x7c, %g3 wrpr %g3, %tnpc done All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the trap time program counter, we'll get that final instruction in the trap handler. On return from trap, we have to pull the register window in but we do this by hand instead of just executing a "restore" instruction for several reasons. The largest being that from Niagara and onward we simply don't have enough levels in the trap stack to fully resolve all possible exception cases of a window fault when we are already at trap level 1 (which we enter to get ready to return from the original trap). This is executed inline via the FILL_*_RTRAP handlers. rtrap_64.S's code branches directly to these to do the window fill by hand if necessary. Now if you look at them, we'll see at the end: ba,a,pt %xcc, user_rtt_fill_fixup; ba,a,pt %xcc, user_rtt_fill_fixup; ba,a,pt %xcc, user_rtt_fill_fixup; And oops, all three cases are handled like a fault. This doesn't work because each of these trap types (data access exception, memory address unaligned, and faults) store their auxiliary info in different registers to pass on to the C handler which does the real work. So in the case where the stack was unaligned, the unaligned trap handler sets up the arg registers one way, and then we branched to the fault handler which expects them setup another way. So the FAULT_TYPE_* value ends up basically being garbage, and randomly would generate the backtrace seen above. Reported-by: Nick Alcock Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1fda90c39d8ef6acbedfd3cd9bd710a5bcc490c3 Author: David S. Miller Date: Sat May 28 21:21:31 2016 -0700 sparc: Harden signal return frame checks. [ Upstream commit d11c2a0de2824395656cf8ed15811580c9dd38aa ] All signal frames must be at least 16-byte aligned, because that is the alignment we explicitly create when we build signal return stack frames. All stack pointers must be at least 8-byte aligned. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6bb3290ce9662055efcf13dc18c12bb62f6f39dc Author: David S. Miller Date: Wed May 25 12:51:20 2016 -0700 sparc64: Take ctx_alloc_lock properly in hugetlb_setup(). [ Upstream commit 9ea46abe22550e3366ff7cee2f8391b35b12f730 ] On cheetahplus chips we take the ctx_alloc_lock in order to modify the TLB lookup parameters for the indexed TLBs, which are stored in the context register. This is called with interrupts disabled, however ctx_alloc_lock is an IRQ safe lock, therefore we must take acquire/release it properly with spin_{lock,unlock}_irq(). Reported-by: Meelis Roos Tested-by: Meelis Roos Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 87575e31be28afb08665f412ac269909c5911a33 Author: Nitin Gupta Date: Wed Mar 30 11:17:13 2016 -0700 sparc64: Reduce TLB flushes during hugepte changes [ Upstream commit 24e49ee3d76b70853a96520e46b8837e5eae65b2 ] During hugepage map/unmap, TSB and TLB flushes are currently issued at every PAGE_SIZE'd boundary which is unnecessary. We now issue the flush at REAL_HPAGE_SIZE boundaries only. Without this patch workloads which unmap a large hugepage backed VMA region get CPU lockups due to excessive TLB flush calls. Orabug: 22365539, 22643230, 22995196 Signed-off-by: Nitin Gupta Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ccd02310db44df820d1e8c54a97daf596dea1c9e Author: Babu Moger Date: Thu Mar 24 13:02:22 2016 -0700 sparc/PCI: Fix for panic while enabling SR-IOV [ Upstream commit d0c31e02005764dae0aab130a57e9794d06b824d ] We noticed this panic while enabling SR-IOV in sparc. mlx4_core: Mellanox ConnectX core driver v2.2-1 (Jan 1 2015) mlx4_core: Initializing 0007:01:00.0 mlx4_core 0007:01:00.0: Enabling SR-IOV with 5 VFs mlx4_core: Initializing 0007:01:00.1 Unable to handle kernel NULL pointer dereference insmod(10010): Oops [#1] CPU: 391 PID: 10010 Comm: insmod Not tainted 4.1.12-32.el6uek.kdump2.sparc64 #1 TPC: I7: <__mlx4_init_one+0x324/0x500 [mlx4_core]> Call Trace: [00000000104c5ea4] __mlx4_init_one+0x324/0x500 [mlx4_core] [00000000104c613c] mlx4_init_one+0xbc/0x120 [mlx4_core] [0000000000725f14] local_pci_probe+0x34/0xa0 [0000000000726028] pci_call_probe+0xa8/0xe0 [0000000000726310] pci_device_probe+0x50/0x80 [000000000079f700] really_probe+0x140/0x420 [000000000079fa24] driver_probe_device+0x44/0xa0 [000000000079fb5c] __device_attach+0x3c/0x60 [000000000079d85c] bus_for_each_drv+0x5c/0xa0 [000000000079f588] device_attach+0x88/0xc0 [000000000071acd0] pci_bus_add_device+0x30/0x80 [0000000000736090] virtfn_add.clone.1+0x210/0x360 [00000000007364a4] sriov_enable+0x2c4/0x520 [000000000073672c] pci_enable_sriov+0x2c/0x40 [00000000104c2d58] mlx4_enable_sriov+0xf8/0x180 [mlx4_core] [00000000104c49ac] mlx4_load_one+0x42c/0xd40 [mlx4_core] Disabling lock debugging due to kernel taint Caller[00000000104c5ea4]: __mlx4_init_one+0x324/0x500 [mlx4_core] Caller[00000000104c613c]: mlx4_init_one+0xbc/0x120 [mlx4_core] Caller[0000000000725f14]: local_pci_probe+0x34/0xa0 Caller[0000000000726028]: pci_call_probe+0xa8/0xe0 Caller[0000000000726310]: pci_device_probe+0x50/0x80 Caller[000000000079f700]: really_probe+0x140/0x420 Caller[000000000079fa24]: driver_probe_device+0x44/0xa0 Caller[000000000079fb5c]: __device_attach+0x3c/0x60 Caller[000000000079d85c]: bus_for_each_drv+0x5c/0xa0 Caller[000000000079f588]: device_attach+0x88/0xc0 Caller[000000000071acd0]: pci_bus_add_device+0x30/0x80 Caller[0000000000736090]: virtfn_add.clone.1+0x210/0x360 Caller[00000000007364a4]: sriov_enable+0x2c4/0x520 Caller[000000000073672c]: pci_enable_sriov+0x2c/0x40 Caller[00000000104c2d58]: mlx4_enable_sriov+0xf8/0x180 [mlx4_core] Caller[00000000104c49ac]: mlx4_load_one+0x42c/0xd40 [mlx4_core] Caller[00000000104c5f90]: __mlx4_init_one+0x410/0x500 [mlx4_core] Caller[00000000104c613c]: mlx4_init_one+0xbc/0x120 [mlx4_core] Caller[0000000000725f14]: local_pci_probe+0x34/0xa0 Caller[0000000000726028]: pci_call_probe+0xa8/0xe0 Caller[0000000000726310]: pci_device_probe+0x50/0x80 Caller[000000000079f700]: really_probe+0x140/0x420 Caller[000000000079fa24]: driver_probe_device+0x44/0xa0 Caller[000000000079fb08]: __driver_attach+0x88/0xa0 Caller[000000000079d90c]: bus_for_each_dev+0x6c/0xa0 Caller[000000000079f29c]: driver_attach+0x1c/0x40 Caller[000000000079e35c]: bus_add_driver+0x17c/0x220 Caller[00000000007a02d4]: driver_register+0x74/0x120 Caller[00000000007263fc]: __pci_register_driver+0x3c/0x60 Caller[00000000104f62bc]: mlx4_init+0x60/0xcc [mlx4_core] Kernel panic - not syncing: Fatal exception Press Stop-A (L1-A) to return to the boot prom ---[ end Kernel panic - not syncing: Fatal exception Details: Here is the call sequence virtfn_add->__mlx4_init_one->dma_set_mask->dma_supported The panic happened at line 760(file arch/sparc/kernel/iommu.c) 758 int dma_supported(struct device *dev, u64 device_mask) 759 { 760 struct iommu *iommu = dev->archdata.iommu; 761 u64 dma_addr_mask = iommu->dma_addr_mask; 762 763 if (device_mask >= (1UL << 32UL)) 764 return 0; 765 766 if ((device_mask & dma_addr_mask) == dma_addr_mask) 767 return 1; 768 769 #ifdef CONFIG_PCI 770 if (dev_is_pci(dev)) 771 return pci64_dma_supported(to_pci_dev(dev), device_mask); 772 #endif 773 774 return 0; 775 } 776 EXPORT_SYMBOL(dma_supported); Same panic happened with Intel ixgbe driver also. SR-IOV code looks for arch specific data while enabling VFs. When VF device is added, driver probe function makes set of calls to initialize the pci device. Because the VF device is added different way than the normal PF device(which happens via of_create_pci_dev for sparc), some of the arch specific initialization does not happen for VF device. That causes panic when archdata is accessed. To fix this, I have used already defined weak function pcibios_setup_device to copy archdata from PF to VF. Also verified the fix. Signed-off-by: Babu Moger Signed-off-by: Sowmini Varadhan Reviewed-by: Ethan Zhao Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b1206090828584bcb1caf4c850d175f297eb7bc8 Author: David S. Miller Date: Tue Mar 1 00:25:32 2016 -0500 sparc64: Fix sparc64_set_context stack handling. [ Upstream commit 397d1533b6cce0ccb5379542e2e6d079f6936c46 ] Like a signal return, we should use synchronize_user_stack() rather than flush_user_windows(). Reported-by: Ilya Malakhov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4185bd68ef40b275669efec624c0d792cd7a2acf Author: Nitin Gupta Date: Tue Jan 5 22:35:35 2016 -0800 sparc64: Fix numa node distance initialization [ Upstream commit 36beca6571c941b28b0798667608239731f9bc3a ] Orabug: 22495713 Currently, NUMA node distance matrix is initialized only when a machine descriptor (MD) exists. However, sun4u machines (e.g. Sun Blade 2500) do not have an MD and thus distance values were left uninitialized. The initialization is now moved such that it happens on both sun4u and sun4v. Signed-off-by: Nitin Gupta Tested-by: Mikael Pettersson Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e9c74337a7c03d33f2afd5bb341cc20ad209698c Author: David S. Miller Date: Wed Apr 27 17:27:37 2016 -0400 sparc64: Fix bootup regressions on some Kconfig combinations. [ Upstream commit 49fa5230462f9f2c4e97c81356473a6bdf06c422 ] The system call tracing bug fix mentioned in the Fixes tag below increased the amount of assembler code in the sequence of assembler files included by head_64.S This caused to total set of code to exceed 0x4000 bytes in size, which overflows the expression in head_64.S that works to place swapper_tsb at address 0x408000. When this is violated, the TSB is not properly aligned, and also the trap table is not aligned properly either. All of this together results in failed boots. So, do two things: 1) Simplify some code by using ba,a instead of ba/nop to get those bytes back. 2) Add a linker script assertion to make sure that if this happens again the build will fail. Fixes: 1a40b95374f6 ("sparc: Fix system call tracing register handling.") Reported-by: Meelis Roos Reported-by: Joerg Abraham Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c9bc125c922e855055cd08b2ec064180218be161 Author: Mike Frysinger Date: Mon Jan 18 06:32:30 2016 -0500 sparc: Fix system call tracing register handling. [ Upstream commit 1a40b95374f680625318ab61d81958e949e0afe3 ] A system call trace trigger on entry allows the tracing process to inspect and potentially change the traced process's registers. Account for that by reloading the %g1 (syscall number) and %i0-%i5 (syscall argument) values. We need to be careful to revalidate the range of %g1, and reload the system call table entry it corresponds to into %l7. Reported-by: Mike Frysinger Signed-off-by: David S. Miller Tested-by: Mike Frysinger Signed-off-by: Greg Kroah-Hartman commit 2b11d80e1aa70b56c6431e4dc3c686ffc61a73bf Author: Al Viro Date: Tue Jun 7 21:26:55 2016 -0400 fix d_walk()/non-delayed __d_free() race commit 3d56c25e3bb0726a5c5e16fc2d9e38f8ed763085 upstream. Ascend-to-parent logics in d_walk() depends on all encountered child dentries not getting freed without an RCU delay. Unfortunately, in quite a few cases it is not true, with hard-to-hit oopsable race as the result. Fortunately, the fix is simiple; right now the rule is "if it ever been hashed, freeing must be delayed" and changing it to "if it ever had a parent, freeing must be delayed" closes that hole and covers all cases the old rule used to cover. Moreover, pipes and sockets remain _not_ covered, so we do not introduce RCU delay in the cases which are the reason for having that delay conditional in the first place. Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman commit c08b1a593a042ae01e788ec5504bee2cfc83e1f2 Author: Jann Horn Date: Wed Jun 1 11:55:07 2016 +0200 sched: panic on corrupted stack end commit 29d6455178a09e1dc340380c582b13356227e8df upstream. Until now, hitting this BUG_ON caused a recursive oops (because oops handling involves do_exit(), which calls into the scheduler, which in turn raises an oops), which caused stuff below the stack to be overwritten until a panic happened (e.g. via an oops in interrupt context, caused by the overwritten CPU index in the thread_info). Just panic directly. Signed-off-by: Jann Horn Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 9beb96b344c846779f67d1be1cdafc66562b94ec Author: Jann Horn Date: Wed Jun 1 11:55:05 2016 +0200 proc: prevent stacking filesystems on top commit e54ad7f1ee263ffa5a2de9c609d58dfa27b21cd9 upstream. This prevents stacking filesystems (ecryptfs and overlayfs) from using procfs as lower filesystem. There is too much magic going on inside procfs, and there is no good reason to stack stuff on top of procfs. (For example, procfs does access checks in VFS open handlers, and ecryptfs by design calls open handlers from a kernel thread that doesn't drop privileges or so.) Signed-off-by: Jann Horn Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 035a94d8d1acdb829575a987222a6d5c45e8a5f2 Author: Andy Lutomirski Date: Tue May 24 15:54:04 2016 -0700 x86/entry/traps: Don't force in_interrupt() to return true in IST handlers commit aaee8c3c5cce2d9107310dd9f3026b4f901d441c upstream. Forcing in_interrupt() to return true if we're not in a bona fide interrupt confuses the softirq code. This fixes warnings like: NOHZ: local_softirq_pending 282 ... which can happen when running things like selftests/x86. This will change perf's static percpu buffer usage in IST context. I think this is okay, and it's changing the behavior to match historical (pre-4.0) behavior. Signed-off-by: Andy Lutomirski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: 959274753857 ("x86, traps: Track entry into and exit from IST context") Link: http://lkml.kernel.org/r/cdc215f94d118d691d73df35275022331156fb45.1464130360.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 47648b5862145187fc8273de0b5330bb9968feb3 Author: Prasun Maiti Date: Mon Jun 6 20:04:19 2016 +0530 wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel commit 3d5fdff46c4b2b9534fa2f9fc78e90a48e0ff724 upstream. iwpriv app uses iw_point structure to send data to Kernel. The iw_point structure holds a pointer. For compatibility Kernel converts the pointer as required for WEXT IOCTLs (SIOCIWFIRST to SIOCIWLAST). Some drivers may use iw_handler_def.private_args to populate iwpriv commands instead of iw_handler_def.private. For those case, the IOCTLs from SIOCIWFIRSTPRIV to SIOCIWLASTPRIV will follow the path ndo_do_ioctl(). Accordingly when the filled up iw_point structure comes from 32 bit iwpriv to 64 bit Kernel, Kernel will not convert the pointer and sends it to driver. So, the driver may get the invalid data. The pointer conversion for the IOCTLs (SIOCIWFIRSTPRIV to SIOCIWLASTPRIV), which follow the path ndo_do_ioctl(), is mandatory. This patch adds pointer conversion from 32 bit to 64 bit and vice versa, if the ioctl comes from 32 bit iwpriv to 64 bit Kernel. Signed-off-by: Prasun Maiti Signed-off-by: Ujjal Roy Tested-by: Dibyajyoti Ghosh Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit dea2cf7c0c6e42ccb1eea2baba028163597bcf22 Author: Jann Horn Date: Wed Jun 1 11:55:06 2016 +0200 ecryptfs: forbid opening files without mmap handler commit 2f36db71009304b3f0b95afacd8eba1f9f046b87 upstream. This prevents users from triggering a stack overflow through a recursive invocation of pagefault handling that involves mapping procfs files into virtual memory. Signed-off-by: Jann Horn Acked-by: Tyler Hicks Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit d3f97524ef1b2b12df6669a701e66c02f1da523d Author: Tejun Heo Date: Fri Jun 3 14:55:44 2016 -0700 memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem() commit 3a06bb78ceeceacc86a1e31133a7944013f9775b upstream. memcg_offline_kmem() may be called from memcg_free_kmem() after a css init failure. memcg_free_kmem() is a ->css_free callback which is called without cgroup_mutex and memcg_offline_kmem() ends up using css_for_each_descendant_pre() without any locking. Fix it by adding rcu read locking around it. mkdir: cannot create directory `65530': No space left on device =============================== [ INFO: suspicious RCU usage. ] 4.6.0-work+ #321 Not tainted ------------------------------- kernel/cgroup.c:4008 cgroup_mutex or RCU read lock required! [ 527.243970] other info that might help us debug this: [ 527.244715] rcu_scheduler_active = 1, debug_locks = 0 2 locks held by kworker/0:5/1664: #0: ("cgroup_destroy"){.+.+..}, at: [] process_one_work+0x165/0x4a0 #1: ((&css->destroy_work)#3){+.+...}, at: [] process_one_work+0x165/0x4a0 [ 527.248098] stack backtrace: CPU: 0 PID: 1664 Comm: kworker/0:5 Not tainted 4.6.0-work+ #321 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 Workqueue: cgroup_destroy css_free_work_fn Call Trace: dump_stack+0x68/0xa1 lockdep_rcu_suspicious+0xd7/0x110 css_next_descendant_pre+0x7d/0xb0 memcg_offline_kmem.part.44+0x4a/0xc0 mem_cgroup_css_free+0x1ec/0x200 css_free_work_fn+0x49/0x5e0 process_one_work+0x1c5/0x4a0 worker_thread+0x49/0x490 kthread+0xea/0x100 ret_from_fork+0x1f/0x40 Link: http://lkml.kernel.org/r/20160526203018.GG23194@mtj.duckdns.org Signed-off-by: Tejun Heo Acked-by: Vladimir Davydov Acked-by: Johannes Weiner Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 1125f3b09513831b6863a1ed78fb0d1750105bfc Author: Helge Deller Date: Sat Jun 4 17:21:33 2016 +0200 parisc: Fix pagefault crash in unaligned __get_user() call commit 8b78f260887df532da529f225c49195d18fef36b upstream. One of the debian buildd servers had this crash in the syslog without any other information: Unaligned handler failed, ret = -2 clock_adjtime (pid 22578): Unaligned data reference (code 28) CPU: 1 PID: 22578 Comm: clock_adjtime Tainted: G E 4.5.0-2-parisc64-smp #1 Debian 4.5.4-1 task: 000000007d9960f8 ti: 00000001bde7c000 task.ti: 00000001bde7c000 YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI PSW: 00001000000001001111100000001111 Tainted: G E r00-03 000000ff0804f80f 00000001bde7c2b0 00000000402d2be8 00000001bde7c2b0 r04-07 00000000409e1fd0 00000000fa6f7fff 00000001bde7c148 00000000fa6f7fff r08-11 0000000000000000 00000000ffffffff 00000000fac9bb7b 000000000002b4d4 r12-15 000000000015241c 000000000015242c 000000000000002d 00000000fac9bb7b r16-19 0000000000028800 0000000000000001 0000000000000070 00000001bde7c218 r20-23 0000000000000000 00000001bde7c210 0000000000000002 0000000000000000 r24-27 0000000000000000 0000000000000000 00000001bde7c148 00000000409e1fd0 r28-31 0000000000000001 00000001bde7c320 00000001bde7c350 00000001bde7c218 sr00-03 0000000001200000 0000000001200000 0000000000000000 0000000001200000 sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402d2e84 00000000402d2e88 IIR: 0ca0d089 ISR: 0000000001200000 IOR: 00000000fa6f7fff CPU: 1 CR30: 00000001bde7c000 CR31: ffffffffffffffff ORIG_R28: 00000002369fe628 IAOQ[0]: compat_get_timex+0x2dc/0x3c0 IAOQ[1]: compat_get_timex+0x2e0/0x3c0 RP(r2): compat_get_timex+0x40/0x3c0 Backtrace: [<00000000402d4608>] compat_SyS_clock_adjtime+0x40/0xc0 [<0000000040205024>] syscall_exit+0x0/0x14 This means the userspace program clock_adjtime called the clock_adjtime() syscall and then crashed inside the compat_get_timex() function. Syscalls should never crash programs, but instead return EFAULT. The IIR register contains the executed instruction, which disassebles into "ldw 0(sr3,r5),r9". This load-word instruction is part of __get_user() which tried to read the word at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in. The unaligned handler is able to emulate all ldw instructions, but it fails if it fails to read the source e.g. because of page fault. The following program reproduces the problem: #define _GNU_SOURCE #include #include #include int main(void) { /* allocate 8k */ char *ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); /* free second half (upper 4k) and make it invalid. */ munmap(ptr+4096, 4096); /* syscall where first int is unaligned and clobbers into invalid memory region */ /* syscall should return EFAULT */ return syscall(__NR_clock_adjtime, 0, ptr+4095); } To fix this issue we simply need to check if the faulting instruction address is in the exception fixup table when the unaligned handler failed. If it is, call the fixup routine instead of crashing. While looking at the unaligned handler I found another issue as well: The target register should not be modified if the handler was unsuccessful. Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit b5ff1d6012726f539723c24712bb71976ee2bc77 Author: hongkun.cao Date: Sat May 21 15:23:39 2016 +0800 pinctrl: mediatek: fix dual-edge code defect commit 5edf673d07fdcb6498be24914f3f38f8d8843199 upstream. When a dual-edge irq is triggered, an incorrect irq will be reported on condition that the external signal is not stable and this incorrect irq has been registered. Correct the register offset. Signed-off-by: Hongkun Cao Reviewed-by: Matthias Brugger Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman commit a976f62a601a763ea37d116d5a9009a2eec9d0f3 Author: Thomas Huth Date: Tue May 31 07:51:17 2016 +0200 powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call commit 7cc851039d643a2ee7df4d18177150f2c3a484f5 upstream. If we do not provide the PVR for POWER8NVL, a guest on this system currently ends up in PowerISA 2.06 compatibility mode on KVM, since QEMU does not provide a generic PowerISA 2.07 mode yet. So some new instructions from POWER8 (like "mtvsrd") get disabled for the guest, resulting in crashes when using code compiled explicitly for POWER8 (e.g. with the "-mcpu=power8" option of GCC). Fixes: ddee09c099c3 ("powerpc: Add PVR for POWER8NVL processor") Signed-off-by: Thomas Huth Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit cac2863ff3e64221f8888da9fcf72080181e91a8 Author: Thomas Huth Date: Thu May 12 13:29:11 2016 +0200 powerpc: Use privileged SPR number for MMCR2 commit 8dd75ccb571f3c92c48014b3dabd3d51a115ab41 upstream. We are already using the privileged versions of MMCR0, MMCR1 and MMCRA in the kernel, so for MMCR2, we should better use the privileged versions, too, to be consistent. Fixes: 240686c13687 ("powerpc: Initialise PMU related regs on Power8") Suggested-by: Paul Mackerras Signed-off-by: Thomas Huth Acked-by: Paul Mackerras Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit 4f27ca0e25df7b317983a1c8a1febecbeae81813 Author: Thomas Huth Date: Thu May 12 13:26:44 2016 +0200 powerpc: Fix definition of SIAR and SDAR registers commit d23fac2b27d94aeb7b65536a50d32bfdc21fe01e upstream. The SIAR and SDAR registers are available twice, one time as SPRs 780 / 781 (unprivileged, but read-only), and one time as the SPRs 796 / 797 (privileged, but read and write). The Linux kernel code currently uses the unprivileged SPRs - while this is OK for reading, writing to that register of course does not work. Since the KVM code tries to write to this register, too (see the mtspr in book3s_hv_rmhandlers.S), the contents of this register sometimes get lost for the guests, e.g. during migration of a VM. To fix this issue, simply switch to the privileged SPR numbers instead. Signed-off-by: Thomas Huth Acked-by: Paul Mackerras Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit baa6dfd627b460f73a8b79fd4cf87ea31e690e36 Author: Russell Currey Date: Thu Apr 7 16:28:26 2016 +1000 powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge commit 871e178e0f2c4fa788f694721a10b4758d494ce1 upstream. In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the spec states that values of 9900-9905 can be returned, indicating that software should delay for 10^x (where x is the last digit, i.e. 990x) milliseconds and attempt the call again. Currently, the kernel doesn't know about this, and respecting it fixes some PCI failures when the hypervisor is busy. The delay is capped at 0.2 seconds. Signed-off-by: Russell Currey Acked-by: Gavin Shan Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit 5e8b53a4db947494f1d808469a411f7f2f8bb3ca Author: Will Deacon Date: Tue Jun 7 17:55:15 2016 +0100 arm64: mm: always take dirty state from new pte in ptep_set_access_flags commit 0106d456c4cb1770253fefc0ab23c9ca760b43f7 upstream. Commit 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM") ensured that pte flags are updated atomically in the face of potential concurrent, hardware-assisted updates. However, Alex reports that: | This patch breaks swapping for me. | In the broken case, you'll see either systemd cpu time spike (because | it's stuck in a page fault loop) or the system hang (because the | application owning the screen is stuck in a page fault loop). It turns out that this is because the 'dirty' argument to ptep_set_access_flags is always 0 for read faults, and so we can't use it to set PTE_RDONLY. The failing sequence is: 1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte 2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF 3. A read faults due to the missing access flag 4. ptep_set_access_flags is called with dirty = 0, due to the read fault 5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!) 6. A write faults, but pte_write is true so we get stuck The solution is to check the new page table entry (as would be done by the generic, non-atomic definition of ptep_set_access_flags that just calls set_pte_at) to establish the dirty state. Fixes: 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM") Reviewed-by: Catalin Marinas Reported-by: Alexander Graf Tested-by: Alexander Graf Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit d0bc1f47b8a7eee50b10035c645f6e6d7e719a62 Author: Catalin Marinas Date: Tue May 31 15:55:03 2016 +0100 arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasks commit e47b020a323d1b2a7b1e9aac86e99eae19463630 upstream. This patch brings the PER_LINUX32 /proc/cpuinfo format more in line with the 32-bit ARM one by providing an additional line: model name : ARMv8 Processor rev X (v8l) Acked-by: Will Deacon Signed-off-by: Catalin Marinas Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit 774920eece6e15f6560ba0ad5a9b25eb43d075fe Author: Tom Lendacky Date: Fri May 20 17:33:03 2016 -0500 crypto: ccp - Fix AES XTS error for request sizes above 4096 commit ab6a11a7c8ef47f996974dd3c648c2c0b1a36ab1 upstream. The ccp-crypto module for AES XTS support has a bug that can allow requests greater than 4096 bytes in size to be passed to the CCP hardware. The CCP hardware does not support request sizes larger than 4096, resulting in incorrect output. The request should actually be handled by the fallback mechanism instantiated by the ccp-crypto module. Add a check to insure the request size is less than or equal to the maximum supported size and use the fallback mechanism if it is not. Signed-off-by: Tom Lendacky Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit b440f3ae617bd81a5699bbba5120c8f54c456f81 Author: Arnd Bergmann Date: Wed May 18 16:55:56 2016 +0200 crypto: public_key: select CRYPTO_AKCIPHER commit bad6a185b4d6f81d0ed2b6e4c16307969f160b95 upstream. In some rare randconfig builds, we can end up with ASYMMETRIC_PUBLIC_KEY_SUBTYPE enabled but CRYPTO_AKCIPHER disabled, which fails to link because of the reference to crypto_alloc_akcipher: crypto/built-in.o: In function `public_key_verify_signature': :(.text+0x110e4): undefined reference to `crypto_alloc_akcipher' This adds a Kconfig 'select' statement to ensure the dependency is always there. Signed-off-by: Arnd Bergmann Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit f32ef5c8e9e847706a3ef96791e14b207914d9e3 Author: Marc Zyngier Date: Thu Jun 2 09:00:28 2016 +0100 irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask commit dd5f1b049dc139876801db3cdd0f20d21fd428cc upstream. The INTID mask is wrong, and is made a signed value, which has nteresting effects in the KVM emulation. Let's sanitize it. Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman commit 9be2fa205c9e1337a68847ae38e791bd6b17cc42 Author: Michael Holzheu Date: Thu May 12 18:10:48 2016 +0200 s390/bpf: reduce maximum program size to 64 KB commit 0fa963553a5c28d8f8aabd8878326d3f782045fc upstream. The s390 BFP compiler currently uses relative branch instructions that only support jumps up to 64 KB. Examples are "j", "jnz", "cgrj", etc. Currently the maximum size of s390 BPF programs is set to 0x7ffff. If branches over 64 KB are generated the, kernel can crash due to incorrect code. So fix this an reduce the maximum size to 64 KB. Programs larger than that will be interpreted. Fixes: ce2b6ad9c185 ("s390/bpf: increase BPF_SIZE_MAX") Signed-off-by: Michael Holzheu Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman commit ebf529182a6da5fcbecf0305b19de4e0cba048fb Author: Michael Holzheu Date: Wed May 11 21:13:13 2016 +0200 s390/bpf: fix recache skb->data/hlen for skb_vlan_push/pop commit 6edf0aa4f8bbdfbb4d6d786892fa02728d05dc36 upstream. In case of usage of skb_vlan_push/pop, in the prologue we store the SKB pointer on the stack and restore it after BPF_JMP_CALL to skb_vlan_push/pop. Unfortunately currently there are two bugs in the code: 1) The wrong stack slot (offset 170 instead of 176) is used 2) The wrong register (W1 instead of B1) is saved So fix this and use correct stack slot and register. Fixes: 9db7f2b81880 ("s390/bpf: recache skb->data/hlen for skb_vlan_push/pop") Signed-off-by: Michael Holzheu Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman commit e1c35534e3684e25053f5caf6e032956894e8b1f Author: Ben Dooks Date: Tue Jun 7 17:22:17 2016 +0100 gpio: bcm-kona: fix bcm_kona_gpio_reset() warnings commit b66b2a0adf0e48973b582e055758b9907a7eee7c upstream. The bcm_kona_gpio_reset() calls bcm_kona_gpio_write_lock_regs() with what looks like the wrong parameter. The write_lock_regs function takes a pointer to the registers, not the bcm_kona_gpio structure. Fix the warning, and probably bug by changing the function to pass reg_base instead of kona_gpio, fixing the following warning: drivers/gpio/gpio-bcm-kona.c:550:47: warning: incorrect type in argument 1 (different address spaces) expected void [noderef] *reg_base got struct bcm_kona_gpio *kona_gpio warning: incorrect type in argument 1 (different address spaces) expected void [noderef] *reg_base got struct bcm_kona_gpio *kona_gpio Signed-off-by: Ben Dooks Acked-by: Ray Jui Reviewed-by: Markus Mayer Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman commit 9edd6fd1eb92ebd61c84975855335922db632ea5 Author: Russell King Date: Mon May 30 23:14:56 2016 +0100 ARM: fix PTRACE_SETVFPREGS on SMP systems commit e2dfb4b880146bfd4b6aa8e138c0205407cebbaf upstream. PTRACE_SETVFPREGS fails to properly mark the VFP register set to be reloaded, because it undoes one of the effects of vfp_flush_hwstate(). Specifically vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to an invalid CPU number, but vfp_set() overwrites this with the original CPU number, thereby rendering the hardware state as apparently "valid", even though the software state is more recent. Fix this by reverting the previous change. Fixes: 8130b9d7b9d8 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers") Acked-by: Will Deacon Tested-by: Simon Marchi Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman commit da7f1c92042c25048a53b0eaf716fc66aaac62a9 Author: Torsten Hilbrich Date: Tue Jun 7 13:14:21 2016 +0200 ALSA: hda/realtek: Add T560 docking unit fixup commit dab38e43b298501a4e8807b56117c029e2e98383 upstream. Tested with Lenovo Ultradock. Fixes the non-working headphone jack on the docking unit. Signed-off-by: Torsten Hilbrich Tested-by: Torsten Hilbrich Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 81999107ce6de995fc532d4fed23a5189ce18eac Author: Kailang Yang Date: Mon May 30 16:44:20 2016 +0800 ALSA: hda/realtek - Add support for new codecs ALC700/ALC701/ALC703 commit 6fbae35a3170c3e2b1b9d7b9cc943cbe48771362 upstream. Support new codecs for ALC700/ALC701/ALC703. Signed-off-by: Kailang Yang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit c3fd646bb8c4faab68fed0e751d0e4f623b5dbab Author: Kailang Yang Date: Mon May 30 15:58:28 2016 +0800 ALSA: hda/realtek - ALC256 speaker noise issue commit e69e7e03ed225abf3e1c43545aa3bcb68dc81d5f upstream. That is some different register for ALC255 and ALC256. ALC256 can't fit with some ALC255 register. This issue is cause from LDO output voltage control. This patch is updated the right LDO register value. Signed-off-by: Kailang Yang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 1bf80a48ff88552644ede5c953321d667808fa1d Author: AceLan Kao Date: Fri Jun 3 14:45:25 2016 +0800 ALSA: hda - Fix headset mic detection problem for Dell machine commit f90d83b301701026b2e4c437a3613f377f63290e upstream. Add the pin configuration value of this machine into the pin_quirk table to make DELL1_MIC_NO_PRESENCE apply to this machine. Signed-off-by: AceLan Kao Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 1f4b75078c205b34f3fcfd04098d61b77e044e68 Author: Vinod Koul Date: Thu Jun 9 11:32:14 2016 +0530 ALSA: hda - Add PCI ID for Kabylake commit 35639a0e98391036a4c7f23253c321d6621a8897 upstream. Kabylake shows up as PCI ID 0xa171. And Kabylake-LP as 0x9d71. Since these are similar to Skylake add these to SKL_PLUS macro Signed-off-by: Vinod Koul Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 2cb77b0ad44351869427d6a744bf5791e6a2c100 Author: Paolo Bonzini Date: Wed Jun 1 14:09:21 2016 +0200 KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi commit c622a3c21ede892e370b56e1ceb9eb28f8bbda6b upstream. Found by syzkaller: BUG: unable to handle kernel NULL pointer dereference at 0000000000000120 IP: [] kvm_irq_map_gsi+0x12/0x90 [kvm] PGD 6f80b067 PUD b6535067 PMD 0 Oops: 0000 [#1] SMP CPU: 3 PID: 4988 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1 [...] Call Trace: [] irqfd_update+0x32/0xc0 [kvm] [] kvm_irqfd+0x3dc/0x5b0 [kvm] [] kvm_vm_ioctl+0x164/0x6f0 [kvm] [] do_vfs_ioctl+0x298/0x480 [] SyS_ioctl+0x79/0x90 [] tracesys_phase2+0x84/0x89 Code: b5 71 a7 e0 5b 41 5c 41 5d 5d f3 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 10 2e 00 00 31 c0 48 89 e5 <39> 91 20 01 00 00 76 6a 48 63 d2 48 8b 94 d1 28 01 00 00 48 85 RIP [] kvm_irq_map_gsi+0x12/0x90 [kvm] RSP CR2: 0000000000000120 Testcase: #include #include #include #include #include #include #include long r[26]; int main() { memset(r, -1, sizeof(r)); r[2] = open("/dev/kvm", 0); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); struct kvm_irqfd ifd; ifd.fd = syscall(SYS_eventfd2, 5, 0); ifd.gsi = 3; ifd.flags = 2; ifd.resamplefd = ifd.fd; r[25] = ioctl(r[3], KVM_IRQFD, &ifd); return 0; } Reported-by: Dmitry Vyukov Signed-off-by: Paolo Bonzini Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman commit ded4fc623b3c331b847d23f947f457a020f25683 Author: Paolo Bonzini Date: Wed Jun 1 14:09:23 2016 +0200 KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS commit d14bdb553f9196169f003058ae1cdabe514470e6 upstream. MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to any of bits 63:32. However, this is not detected at KVM_SET_DEBUGREGS time, and the next KVM_RUN oopses: general protection fault: 0000 [#1] SMP CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1 Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012 [...] Call Trace: [] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm] [] kvm_vcpu_ioctl+0x33d/0x620 [kvm] [] do_vfs_ioctl+0x298/0x480 [] SyS_ioctl+0x79/0x90 [] entry_SYSCALL_64_fastpath+0x12/0x71 Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 RIP [] native_set_debugreg+0x2b/0x40 RSP Testcase (beautified/reduced from syzkaller output): #include #include #include #include #include #include #include long r[8]; int main() { struct kvm_debugregs dr = { 0 }; r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7); memcpy(&dr, "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72" "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8" "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9" "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb", 48); r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr); r[6] = ioctl(r[4], KVM_RUN, 0); } Reported-by: Dmitry Vyukov Signed-off-by: Paolo Bonzini Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman commit ce9c0dba5bf3ad4a25a9dc202e36e74d904df61d Author: David Wragg Date: Fri Jun 3 18:58:15 2016 -0400 vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices [ Upstream commit 7e059158d57b79159eaf1f504825d19866ef2c42 ] Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could transmit vxlan packets of any size, constrained only by the ability to send out the resulting packets. 4.3 introduced netdevs corresponding to tunnel vports. These netdevs have an MTU, which limits the size of a packet that can be successfully encapsulated. The default MTU values are low (1500 or less), which is awkwardly small in the context of physical networks supporting jumbo frames, and leads to a conspicuous change in behaviour for userspace. Instead, set the MTU on openvswitch-created netdevs to be the relevant maximum (i.e. the maximum IP packet size minus any relevant overhead), effectively restoring the behaviour prior to 4.3. Signed-off-by: David Wragg Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 51d7c394605bd5d72e76745def0002dd938ec48b Author: David Wragg Date: Fri Jun 3 18:58:14 2016 -0400 geneve: Relax MTU constraints [ Upstream commit 55e5bfb53cff286c1c1ff49f51325dc15c7fea63 ] Allow the MTU of geneve devices to be set to large values, in order to exploit underlying networks with larger frame sizes. GENEVE does not have a fixed encapsulation overhead (an openvswitch rule can add variable length options), so there is no relevant maximum MTU to enforce. A maximum of IP_MAX_MTU is used instead. Encapsulated packets that are too big for the underlying network will get dropped on the floor. Signed-off-by: David Wragg Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3dc443059897b8a2fa3e3b18f794ee31c0063730 Author: David Wragg Date: Fri Jun 3 18:58:13 2016 -0400 vxlan: Relax MTU constraints [ Upstream commit 72564b59ffc438ea103b0727a921aaddce766728 ] Allow the MTU of vxlan devices without an underlying device to be set to larger values (up to a maximum based on IP packet limits and vxlan overhead). Previously, their MTUs could not be set to higher than the conventional ethernet value of 1500. This is a very arbitrary value in the context of vxlan, and prevented vxlan devices from being able to take advantage of jumbo frames etc. The default MTU remains 1500, for compatibility. Signed-off-by: David Wragg Acked-by: Roopa Prabhu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4d82f395bb0597a3aba23387406e2c07332d6ae9 Author: Jakub Sitnicki Date: Wed Jun 8 15:13:34 2016 +0200 ipv6: Skip XFRM lookup if dst_entry in socket cache is valid [ Upstream commit 00bc0ef5880dc7b82f9c320dead4afaad48e47be ] At present we perform an xfrm_lookup() for each UDPv6 message we send. The lookup involves querying the flow cache (flow_cache_lookup) and, in case of a cache miss, creating an XFRM bundle. If we miss the flow cache, we can end up creating a new bundle and deriving the path MTU (xfrm_init_pmtu) from on an already transformed dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down to xfrm_lookup(). This can happen only if we're caching the dst_entry in the socket, that is when we're using a connected UDP socket. To put it another way, the path MTU shrinks each time we miss the flow cache, which later on leads to incorrectly fragmented payload. It can be observed with ESPv6 in transport mode: 1) Set up a transformation and lower the MTU to trigger fragmentation # ip xfrm policy add dir out src ::1 dst ::1 \ tmpl src ::1 dst ::1 proto esp spi 1 # ip xfrm state add src ::1 dst ::1 \ proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b # ip link set dev lo mtu 1500 2) Monitor the packet flow and set up an UDP sink # tcpdump -ni lo -ttt & # socat udp6-listen:12345,fork /dev/null & 3) Send a datagram that needs fragmentation with a connected socket # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345 2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error 00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448 00:00:00.000014 IP6 ::1 > ::1: frag (1448|32) 00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272 (^ ICMPv6 Parameter Problem) 00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136 4) Compare it to a non-connected socket # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345 00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448 00:00:00.000010 IP6 ::1 > ::1: frag (1448|64) What happens in step (3) is: 1) when connecting the socket in __ip6_datagram_connect(), we perform an XFRM lookup, miss the flow cache, create an XFRM bundle, and cache the destination, 2) afterwards, when sending the datagram, we perform an XFRM lookup, again, miss the flow cache (due to mismatch of flowi6_iif and flowi6_oif, which is an issue of its own), and recreate an XFRM bundle based on the cached (and already transformed) destination. To prevent the recreation of an XFRM bundle, avoid an XFRM lookup altogether whenever we already have a destination entry cached in the socket. This prevents the path MTU shrinkage and brings us on par with UDPv4. The fix also benefits connected PINGv6 sockets, another user of ip6_sk_dst_lookup_flow(), who also suffer messages being transformed twice. Joint work with Hannes Frederic Sowa. Reported-by: Jan Tluka Signed-off-by: Jakub Sitnicki Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 05cbd46be7f1aaa355301e2e12c378cbfdeeeb2a Author: Guillaume Nault Date: Wed Jun 8 12:59:17 2016 +0200 l2tp: fix configuration passed to setup_udp_tunnel_sock() [ Upstream commit a5c5e2da8551eb69e5d5d09d51d526140b5db9fb ] Unused fields of udp_cfg must be all zeros. Otherwise setup_udp_tunnel_sock() fills ->gro_receive and ->gro_complete callbacks with garbage, eventually resulting in panic when used by udp_gro_receive(). [ 72.694123] BUG: unable to handle kernel paging request at ffff880033f87d78 [ 72.695518] IP: [] 0xffff880033f87d78 [ 72.696530] PGD 26e2067 PUD 26e3067 PMD 342ed063 PTE 8000000033f87163 [ 72.696530] Oops: 0011 [#1] SMP KASAN [ 72.696530] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pptp gre pppox ppp_generic slhc crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel evdev aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper serio_raw acpi_cpufreq button proc\ essor ext4 crc16 jbd2 mbcache virtio_blk virtio_net virtio_pci virtio_ring virtio [ 72.696530] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.7.0-rc1 #1 [ 72.696530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 72.696530] task: ffff880035b59700 ti: ffff880035b70000 task.ti: ffff880035b70000 [ 72.696530] RIP: 0010:[] [] 0xffff880033f87d78 [ 72.696530] RSP: 0018:ffff880035f87bc0 EFLAGS: 00010246 [ 72.696530] RAX: ffffed000698f996 RBX: ffff88003326b840 RCX: ffffffff814cc823 [ 72.696530] RDX: ffff88003326b840 RSI: ffff880033e48038 RDI: ffff880034c7c780 [ 72.696530] RBP: ffff880035f87c18 R08: 000000000000a506 R09: 0000000000000000 [ 72.696530] R10: ffff880035f87b38 R11: ffff880034b9344d R12: 00000000ebfea715 [ 72.696530] R13: 0000000000000000 R14: ffff880034c7c780 R15: 0000000000000000 [ 72.696530] FS: 0000000000000000(0000) GS:ffff880035f80000(0000) knlGS:0000000000000000 [ 72.696530] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 72.696530] CR2: ffff880033f87d78 CR3: 0000000033c98000 CR4: 00000000000406a0 [ 72.696530] Stack: [ 72.696530] ffffffff814cc834 ffff880034b93468 0000001481416818 ffff88003326b874 [ 72.696530] ffff880034c7ccb0 ffff880033e48038 ffff88003326b840 ffff880034b93462 [ 72.696530] ffff88003326b88a ffff88003326b88c ffff880034b93468 ffff880035f87c70 [ 72.696530] Call Trace: [ 72.696530] [ 72.696530] [] ? udp_gro_receive+0x1c6/0x1f9 [ 72.696530] [] udp4_gro_receive+0x2b5/0x310 [ 72.696530] [] inet_gro_receive+0x4a3/0x4cd [ 72.696530] [] dev_gro_receive+0x584/0x7a3 [ 72.696530] [] ? __lock_is_held+0x29/0x64 [ 72.696530] [] napi_gro_receive+0x124/0x21d [ 72.696530] [] virtnet_receive+0x8df/0x8f6 [virtio_net] [ 72.696530] [] virtnet_poll+0x1d/0x8d [virtio_net] [ 72.696530] [] net_rx_action+0x15b/0x3b9 [ 72.696530] [] __do_softirq+0x216/0x546 [ 72.696530] [] irq_exit+0x49/0xb6 [ 72.696530] [] do_IRQ+0xe2/0xfa [ 72.696530] [] common_interrupt+0x89/0x89 [ 72.696530] [ 72.696530] [] ? trace_hardirqs_on_caller+0x229/0x270 [ 72.696530] [] ? default_idle+0x1c/0x2d [ 72.696530] [] ? default_idle+0x1a/0x2d [ 72.696530] [] arch_cpu_idle+0xa/0xc [ 72.696530] [] default_idle_call+0x1a/0x1c [ 72.696530] [] cpu_startup_entry+0x15b/0x20f [ 72.696530] [] start_secondary+0x12c/0x133 [ 72.696530] Code: ff ff ff ff ff ff ff ff ff ff 7f ff ff ff ff ff ff ff 7f 00 7e f8 33 00 88 ff ff 6d 61 58 81 ff ff ff ff 5e de 0a 81 ff ff ff ff <00> 5c e2 34 00 88 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 72.696530] RIP [] 0xffff880033f87d78 [ 72.696530] RSP [ 72.696530] CR2: ffff880033f87d78 [ 72.696530] ---[ end trace ad7758b9a1dccf99 ]--- [ 72.696530] Kernel panic - not syncing: Fatal exception in interrupt [ 72.696530] Kernel Offset: disabled [ 72.696530] ---[ end Kernel panic - not syncing: Fatal exception in interrupt v2: use empty initialiser instead of "{ NULL }" to avoid relying on first field's type. Fixes: 38fd2af24fcf ("udp: Add socket based GRO and config") Signed-off-by: Guillaume Nault Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 38f56354f4e1cfbaa1f2f10e9acb30f105b70aed Author: Toshiaki Makita Date: Tue Jun 7 19:14:17 2016 +0900 bridge: Don't insert unnecessary local fdb entry on changing mac address [ Upstream commit 0b148def403153a4d1565f1640356cb78ce5109f ] The missing br_vlan_should_use() test caused creation of an unneeded local fdb entry on changing mac address of a bridge device when there is a vlan which is configured on a bridge port but not on the bridge device. Fixes: 2594e9064a57 ("bridge: vlan: add per-vlan struct and move to rhashtables") Signed-off-by: Toshiaki Makita Acked-by: Nikolay Aleksandrov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f946ceab11c4bede9b3d6c0d56f0964d65bfff65 Author: Yuchung Cheng Date: Mon Jun 6 15:07:18 2016 -0700 tcp: record TLP and ER timer stats in v6 stats [ Upstream commit ce3cf4ec0305919fc69a972f6c2b2efd35d36abc ] The v6 tcp stats scan do not provide TLP and ER timer information correctly like the v4 version . This patch fixes that. Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)") Fixes: eed530b6c676 ("tcp: early retransmit") Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 721976e93e5d8963d0c937ee236489968bfcfb81 Author: Chen Haiquan Date: Fri May 27 10:49:11 2016 +0800 vxlan: Accept user specified MTU value when create new vxlan link [ Upstream commit ce577668a426c6a9e2470a09dcd07fbd6e45272a ] When create a new vxlan link, example: ip link add vtap mtu 1440 type vxlan vni 1 dev eth0 The argument "mtu" has no effect, because it is not set to conf->mtu. The default value is used in vxlan_dev_configure function. This problem was introduced by commit 0dfbdf4102b9 (vxlan: Factor out device configuration). Fixes: 0dfbdf4102b9 (vxlan: Factor out device configuration) Signed-off-by: Chen Haiquan Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 13a055d6ca34dcce324ccfab63cc05db726030bc Author: Ivan Vecera Date: Wed May 25 21:21:52 2016 +0200 team: don't call netdev_change_features under team->lock [ Upstream commit f6988cb63a4e698d8a62a1d085d263d1fcc351ea ] The team_device_event() notifier calls team_compute_features() to fix vlan_features under team->lock to protect team->port_list. The problem is that subsequent __team_compute_features() calls netdev_change_features() to propagate vlan_features to upper vlan devices while team->lock is still taken. This can lead to deadlock when NETIF_F_LRO is modified on lower devices or team device itself. Example: The team0 as active backup with eth0 and eth1 NICs. Both eth0 & eth1 are LRO capable and LRO is enabled. Thus LRO is also enabled on team0. The command 'ethtool -K team0 lro off' now hangs due to this deadlock: dev_ethtool() -> ethtool_set_features() -> __netdev_update_features(team) -> netdev_sync_lower_features() -> netdev_update_features(lower_1) -> __netdev_update_features(lower_1) -> netdev_features_change(lower_1) -> call_netdevice_notifiers(...) -> team_device_event(lower_1) -> team_compute_features(team) [TAKES team->lock] -> netdev_change_features(team) -> __netdev_update_features(team) -> netdev_sync_lower_features() -> netdev_update_features(lower_2) -> __netdev_update_features(lower_2) -> netdev_features_change(lower_2) -> call_netdevice_notifiers(...) -> team_device_event(lower_2) -> team_compute_features(team) [DEADLOCK] The bug is present in team from the beginning but it appeared after the commit fd867d5 (net/core: generic support for disabling netdev features down stack) that adds synchronization of features with lower devices. Fixes: fd867d5 (net/core: generic support for disabling netdev features down stack) Cc: Jiri Pirko Signed-off-by: Ivan Vecera Signed-off-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 450db517b9223908ab5c1a9b5ddce3b6c4d75b18 Author: Edward Cree Date: Tue May 24 18:53:36 2016 +0100 sfc: on MC reset, clear PIO buffer linkage in TXQs [ Upstream commit c0795bf64cba4d1b796fdc5b74b33772841ed1bb ] Otherwise, if we fail to allocate new PIO buffers, our TXQs will try to use the old ones, which aren't there any more. Fixes: 183233bec810 "sfc: Allocate and link PIO buffers; map them with write-combining" Signed-off-by: Edward Cree Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bfe951d547bf15bf1192abd20773e6603dacadf1 Author: Daniel Borkmann Date: Sun May 22 23:16:18 2016 +0200 bpf, inode: disallow userns mounts [ Upstream commit 612bacad78ba6d0a91166fc4487af114bac172a8 ] Follow-up to commit e27f4a942a0e ("bpf: Use mount_nodev not mount_ns to mount the bpf filesystem"), which removes the FS_USERNS_MOUNT flag. The original idea was to have a per mountns instance instead of a single global fs instance, but that didn't work out and we had to switch to mount_nodev() model. The intent of that middle ground was that we avoid users who don't play nice to create endless instances of bpf fs which are difficult to control and discover from an admin point of view, but at the same time it would have allowed us to be more flexible with regard to namespaces. Therefore, since we now did the switch to mount_nodev() as a fix where individual instances are created, we also need to remove userns mount flag along with it to avoid running into mentioned situation. I don't expect any breakage at this early point in time with removing the flag and we can revisit this later should the requirement for this come up with future users. This and commit e27f4a942a0e have been split to facilitate tracking should any of them run into the unlikely case of causing a regression. Fixes: b2197755b263 ("bpf: add support for persistent maps/progs") Signed-off-by: Daniel Borkmann Acked-by: Hannes Frederic Sowa Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f5f16bf66d7e07e5a04f07226caefeaf3136c83a Author: Nicolas Dichtel Date: Thu May 19 17:26:29 2016 +0200 uapi glibc compat: fix compilation when !__USE_MISC in glibc [ Upstream commit f0a3fdca794d1e68ae284ef4caefe681f7c18e89 ] These structures are defined only if __USE_MISC is set in glibc net/if.h headers, ie when _BSD_SOURCE or _SVID_SOURCE are defined. CC: Jan Engelhardt CC: Josh Boyer CC: Stephen Hemminger CC: Waldemar Brodkorb CC: Gabriel Laskar CC: Mikko Rapeli Fixes: 4a91cb61bb99 ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h") Signed-off-by: Nicolas Dichtel Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ab1f253ddccc235520fa4f70d32a8dd6bf8ef346 Author: Hannes Frederic Sowa Date: Thu May 19 15:58:33 2016 +0200 udp: prevent skbs lingering in tunnel socket queues [ Upstream commit e5aed006be918af163eb397e45aa5ea6cefd5e01 ] In case we find a socket with encapsulation enabled we should call the encap_recv function even if just a udp header without payload is available. The callbacks are responsible for correctly verifying and dropping the packets. Also, in case the header validation fails for geneve and vxlan we shouldn't put the skb back into the socket queue, no one will pick them up there. Instead we can simply discard them in the respective encap_recv functions. Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5b7ea922e1754107f77d146011612f2e42600cc1 Author: Eric W. Biederman Date: Fri May 20 17:22:48 2016 -0500 bpf: Use mount_nodev not mount_ns to mount the bpf filesystem [ Upstream commit e27f4a942a0ee4b84567a3c6cfa84f273e55cbb7 ] While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the bpf filesystem. Looking at the code I saw a broken usage of mount_ns with current->nsproxy->mnt_ns. As the code does not acquire a reference to the mount namespace it can not possibly be correct to store the mount namespace on the superblock as it does. Replace mount_ns with mount_nodev so that each mount of the bpf filesystem returns a distinct instance, and the code is not buggy. In discussion with Hannes Frederic Sowa it was reported that the use of mount_ns was an attempt to have one bpf instance per mount namespace, in an attempt to keep resources that pin resources from hiding. That intent simply does not work, the vfs is not built to allow that kind of behavior. Which means that the bpf filesystem really is buggy both semantically and in it's implemenation as it does not nor can it implement the original intent. This change is userspace visible, but my experience with similar filesystems leads me to believe nothing will break with a model of each mount of the bpf filesystem is distinct from all others. Fixes: b2197755b263 ("bpf: add support for persistent maps/progs") Cc: Hannes Frederic Sowa Acked-by: Daniel Borkmann Signed-off-by: "Eric W. Biederman" Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bccd56fad0433ef1ed0661bb3a4055503cdec4e9 Author: Jason Wang Date: Thu May 19 13:36:51 2016 +0800 tuntap: correctly wake up process during uninit [ Upstream commit addf8fc4acb1cf79492ac64966f07178793cb3d7 ] We used to check dev->reg_state against NETREG_REGISTERED after each time we are woke up. But after commit 9e641bdcfa4e ("net-tun: restructure tun_do_read for better sleep/wakeup efficiency"), it uses skb_recv_datagram() which does not check dev->reg_state. This will result if we delete a tun/tap device after a process is blocked in the reading. The device will wait for the reference count which was held by that process for ever. Fixes this by using RCV_SHUTDOWN which will be checked during sk_recv_datagram() before trying to wake up the process during uninit. Fixes: 9e641bdcfa4e ("net-tun: restructure tun_do_read for better sleep/wakeup efficiency") Cc: Eric Dumazet Cc: Xi Wang Cc: Michael S. Tsirkin Signed-off-by: Jason Wang Acked-by: Eric Dumazet Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 835d0122a57ffaea43162d0e7a4f5bf25d2af78b Author: Jiri Pirko Date: Tue May 17 18:58:08 2016 +0200 switchdev: pass pointer to fib_info instead of copy [ Upstream commit da4ed55165d41b1073f9a476f1c18493e9bf8c8e ] The problem is that fib_info->nh is [0] so the struct fib_info allocation size depends on number of nexthops. If we just copy fib_info, we do not copy the nexthops info and driver accesses memory which is not ours. Given the fact that fib4 does not defer operations and therefore it does not need copy, just pass the pointer down to drivers as it was done before. Fixes: 850d0cbc91 ("switchdev: remove pointers from switchdev objects") Signed-off-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6a58f3e12e8e5c330940b38cc8c98d52171aefa8 Author: Richard Alpe Date: Tue May 17 16:57:37 2016 +0200 tipc: fix nametable publication field in nl compat [ Upstream commit 03aaaa9b941e136757b55c4cf775aab6068dfd94 ] The publication field of the old netlink API should contain the publication key and not the publication reference. Fixes: 44a8ae94fd55 (tipc: convert legacy nl name table dump to nl compat) Signed-off-by: Richard Alpe Acked-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 49543942beb1b9ca95709d6cfa67708932aa4d11 Author: Herbert Xu Date: Mon May 16 17:28:16 2016 +0800 netlink: Fix dump skb leak/double free [ Upstream commit 92964c79b357efd980812c4de5c1fd2ec8bb5520 ] When we free cb->skb after a dump, we do it after releasing the lock. This means that a new dump could have started in the time being and we'll end up freeing their skb instead of ours. This patch saves the skb and module before we unlock so we free the right memory. Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.") Reported-by: Baozeng Ding Signed-off-by: Herbert Xu Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 23cdd8c3cbe9d790f23d7f9ae14e9b828f56f69c Author: Richard Alpe Date: Mon May 16 11:14:54 2016 +0200 tipc: check nl sock before parsing nested attributes [ Upstream commit 45e093ae2830cd1264677d47ff9a95a71f5d9f9c ] Make sure the socket for which the user is listing publication exists before parsing the socket netlink attributes. Prior to this patch a call without any socket caused a NULL pointer dereference in tipc_nl_publ_dump(). Tested-and-reported-by: Baozeng Ding Signed-off-by: Richard Alpe Acked-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c54c115da7214a41a697180964cf6d7a5a50b599 Author: Ewan D. Milne Date: Tue May 31 09:42:29 2016 -0400 scsi: Add QEMU CD-ROM to VPD Inquiry Blacklist commit fbd83006e3e536fcb103228d2422ea63129ccb03 upstream. Linux fails to boot as a guest with a QEMU CD-ROM: [ 4.439488] ata2.00: ATAPI: QEMU CD-ROM, 0.8.2, max UDMA/100 [ 4.443649] ata2.00: configured for MWDMA2 [ 4.450267] scsi 1:0:0:0: CD-ROM QEMU QEMU CD-ROM 0.8. PQ: 0 ANSI: 5 [ 4.464317] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 4.464319] ata2.00: BMDMA stat 0x5 [ 4.464339] ata2.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 0 dma 16640 in [ 4.464339] Inquiry 12 01 00 00 ff 00res 48/20:02:00:24:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation) [ 4.464341] ata2.00: status: { DRDY DRQ } [ 4.465864] ata2: soft resetting link [ 4.625971] ata2.00: configured for MWDMA2 [ 4.628290] ata2: EH complete [ 4.646670] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 4.646671] ata2.00: BMDMA stat 0x5 [ 4.646683] ata2.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 0 dma 16640 in [ 4.646683] Inquiry 12 01 00 00 ff 00res 48/20:02:00:24:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation) [ 4.646685] ata2.00: status: { DRDY DRQ } [ 4.648193] ata2: soft resetting link ... Fix this by suppressing VPD inquiry for this device. Signed-off-by: Ewan D. Milne Reported-by: Jan Stancek Tested-by: Jan Stancek Reviewed-by: Johannes Thumshirn Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman commit 0dec8c0d67c64401d97122e4eba347ccc5850622 Author: James Bottomley Date: Fri May 13 12:04:06 2016 -0700 scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands commit a621bac3044ed6f7ec5fa0326491b2d4838bfa93 upstream. When SCSI was written, all commands coming from the filesystem (REQ_TYPE_FS commands) had data. This meant that our signal for needing to complete the command was the number of bytes completed being equal to the number of bytes in the request. Unfortunately, with the advent of flush barriers, we can now get zero length REQ_TYPE_FS commands, which confuse this logic because they satisfy the condition every time. This means they never get retried even for retryable conditions, like UNIT ATTENTION because we complete them early assuming they're done. Fix this by special casing the early completion condition to recognise zero length commands with errors and let them drop through to the retry code. Reported-by: Sebastian Parschauer Signed-off-by: James E.J. Bottomley Tested-by: Jack Wang Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman